πŸ“Š PDF Page Counter & Size Analyzer

Last updated: June 16, 2026

πŸ“Š PDF Page Counter & Size Analyzer

See page count, dimensions, and which pages are bloating your PDF β€” instantly, in your browser. No upload, no server.

πŸ“„
Drop your PDF here or click to browse
Works 100% offline β€” your file never leaves your device
Parsing PDF structure…
File Summary
β€”
Total Pages
β€”
File Size
β€”
Common Size
β€”
Avg per Page
Per-Page Breakdown 0 pages
# Width Γ— Height Inches Est. Size Size Share

Why Your PDF's File Size Is Almost Never Evenly Distributed Across Pages

Open any 50-page PDF in a hex editor and count the bytes per page β€” you will almost never see a uniform split. A document with a scanned photo on page 3, vector diagrams on pages 11 through 14, and plain text everywhere else can have 80% of its bytes concentrated in fewer than 10% of its pages. This lopsided distribution is not a bug; it is a direct consequence of how the PDF format stores objects. Understanding that distribution before you compress or split a file is the difference between a well-targeted optimization and a blind operation that either misses the real bloat or breaks the document's logical structure.

How PDF Object Storage Creates Uneven Page Weight

A PDF file is a collection of numbered objects β€” streams, dictionaries, arrays, and references β€” assembled at write time and indexed in a cross-reference table near the end of the file. Each page is itself a dictionary object that references other objects: its content stream (the actual drawing instructions), its resource dictionary (fonts, color spaces, patterns), and any image XObjects embedded inline or shared across pages.

Image XObjects are the single largest driver of uneven file size. A 300 DPI scanned photograph stored as a JPEG stream might occupy 400 KB as a single object. If that image appears only on one page, that page's logical "weight" dwarfs every other page in the document. Conversely, a font program embedded once and shared across all pages gets counted against the file's overhead, not any individual page β€” which means per-page size estimates always carry some ambiguity about how to allocate shared resources.

Vector content (paths, curves, text drawn as outlines) is dramatically smaller than raster content at equivalent visual complexity. A full-page architectural drawing built entirely from Bezier paths might occupy 20 KB; a photograph of the same building at press resolution might occupy 4 MB. This 200x ratio explains why PDFs exported from design tools with mixed content behave so unpredictably when you try to split or compress them without first profiling the per-page weight.

The Three Categories of PDF Page Bloat

In practice, oversized pages in a PDF fall into three categories, each requiring a different fix:

Embedded high-resolution raster images. This is the most common case. A photographer's portfolio PDF, a scanned contract, or a presentation exported from PowerPoint with uncompressed screenshots. The fix is downsampling the images (reducing DPI from 300 to 150 for screen viewing) or switching to a more aggressive JPEG compression quality. Identifying which pages carry the large images before compressing lets you target the operation and verify the output.

Embedded font subsets that are disproportionately large. Some CJK (Chinese, Japanese, Korean) font subsets can run 2–5 MB per embedded font. A document that switches typefaces on a single decorative page β€” a title page with a custom display font β€” may carry most of its font overhead in that one page's associated resource dictionary. Splitting such a document without re-embedding fonts will leave the font data orphaned or duplicated.

Transparency and compositing groups. PDFs that use transparency blending (drop shadows, gradients over photographs, soft masks) generate additional soft-mask streams and form XObjects that add hidden bulk. These are invisible in the page count but appear as extra stream objects in the file. They are also the reason why "flatten transparency" is a common pre-press step β€” it eliminates these streams at the cost of rasterizing the affected regions.

What Page Dimensions Tell You About Compression Potential

A page's MediaBox dimensions (stored in points, where 72 points equal one inch) do not directly indicate file size, but they provide important context for interpreting size. A 210Γ—297 mm (A4) page at 50 KB is almost certainly text-only or contains only vector graphics. The same dimensions at 2 MB almost certainly contains an embedded raster image, and that image is probably at a resolution far higher than the screen or printer requires.

Standard paper sizes appear in the PDF specification as exact point values: A4 is 595.28Γ—841.89 pt, US Letter is 612Γ—792 pt, A3 is 841.89Γ—1190.55 pt. Pages that deviate from these standards β€” say, 794Γ—1124 pt β€” are often the result of exporting from a tool that added margins differently, or scanning with a flatbed that captured a slightly non-standard crop. Non-standard dimensions matter when you are splitting a PDF and then printing: pages that are slightly oversized will be cropped or scaled by some printers in unexpected ways.

Landscape pages embedded in a portrait document are a common source of confusion when splitting. A 200-page report with three landscape pages (tables or charts rotated 90Β°) must be split with awareness of those rotations, or the resulting sub-files will have inconsistent reading orientations.

The Right Workflow: Analyze Before You Compress or Split

The correct sequence for any PDF optimization task is: profile first, act second, verify third. Profiling means knowing the page count, the dimensions of every page, and the approximate byte weight of each page before running any destructive operation.

For compression, the profile tells you which pages to focus the operation on. If pages 1, 2, and 4 each account for less than 1% of the file size but page 3 accounts for 60%, compressing the entire file with a uniform quality setting is wasteful β€” page 3 needs targeted image downsampling, and the other pages should be left untouched to preserve text sharpness.

For splitting, the profile tells you whether the split will produce balanced sub-files or wildly uneven ones. A 100-page PDF split at page 50 might produce a 2 MB first half and a 45 MB second half if all the high-resolution images happen to fall in the back matter. Knowing this in advance lets you choose a different split point or inform the recipient about the asymmetry.

For archiving, the dimension data tells you whether the PDF conforms to a standard (PDF/A requires specific constraints on embedded content) and whether any pages have non-standard sizes that might cause compliance issues.

How Browser-Based PDF Analysis Works Without a Server

The PDF format is a text-based object graph with binary streams. Reading the raw bytes of a PDF file in a browser β€” using the FileReader API to load the ArrayBuffer β€” gives you access to the same cross-reference table and object dictionary that a full PDF library would parse. From that raw text you can extract the /Count entry in the Pages dictionary (the authoritative page count), every /MediaBox array (page dimensions in points), and locate stream delimiters (stream/endstream keywords) to estimate per-stream byte sizes.

This approach works for the vast majority of PDFs produced by office tools, design applications, and scanners. The exception is PDF 1.5+ cross-reference streams β€” a binary compressed format for the xref table used by some modern generators. Files using compressed xref streams require a full inflate/deflate step to parse the object index, which is beyond what a lightweight regex-based parser handles. For those files, the tool will report the limitation clearly rather than silently producing incorrect results.

The key advantage of running this analysis entirely in the browser is privacy: your document's contents never leave your device. For legal contracts, financial statements, medical records, or any confidential document, this matters more than the marginal capability difference between a client-side and server-side parser.

FAQ

How accurate is the per-page size estimate?
The per-page size is estimated from stream byte ranges found in the raw PDF file. For simple PDFs (mostly text, images on specific pages), the estimates closely reflect actual per-page weight. For PDFs with many shared resources (fonts reused across all pages, globally referenced images), the estimate distributes shared overhead evenly, which may under-represent simple pages and over-represent complex ones. The estimate is accurate enough to identify outlier pages β€” pages that are 5x or 10x heavier than average β€” which is its primary purpose.
Why does my page count show '0' or give an error?
This usually means the PDF uses cross-reference streams (a binary compressed format introduced in PDF 1.5, used by modern Adobe tools and some print-production workflows). These files require a full inflate/decompress step to read the object index, which goes beyond what a lightweight browser-based parser can do without a PDF library. Try opening the file in Adobe Acrobat and saving it as 'PDF 1.4 compatible' β€” that forces an uncompressed xref table that this tool can read.
What do the 'Bloat' and 'Heavy' tags mean on individual pages?
'Bloat' means that page's estimated size is at least 70% as large as the single largest page β€” it is in the top tier of size consumers and should be your first target for compression. 'Heavy' means it is between 35% and 70% of the largest page's weight β€” worth investigating but not the primary culprit. Pages with no tag are lightweight relative to the rest of the document.
Can this tool open password-protected or encrypted PDFs?
No. Encrypted PDFs have their object streams encrypted, so the raw bytes do not contain readable cross-reference data or MediaBox entries. You will need to remove the password protection first (using Acrobat, PDF24, or a similar tool) before analyzing the file here.
My PDF says all pages are the same size but I know some are landscape β€” why?
Page rotation in PDFs is handled by a /Rotate entry in the page dictionary rather than by swapping the MediaBox width and height. A landscape page can have the same MediaBox as a portrait page (e.g., 595Γ—842) but a /Rotate value of 90 or 270 degrees. The MediaBox dimensions reflect the raw coordinate space, not the visual orientation. This tool reports the raw MediaBox dimensions; actual displayed orientation may differ based on the rotation flag.
Is my file uploaded to any server?
No. The entire analysis runs in your browser using the JavaScript FileReader API. Your file is read directly into memory on your device and never transmitted anywhere. This makes the tool safe to use with confidential documents, legal contracts, medical records, or any file you cannot share with a third party.