π PDF Page Counter & Size Analyzer
See page count, dimensions, and which pages are bloating your PDF β instantly, in your browser. No upload, no server.
| # | Width Γ Height | Inches | Est. Size | Size Share |
|---|
Why Your PDF's File Size Is Almost Never Evenly Distributed Across Pages
Open any 50-page PDF in a hex editor and count the bytes per page β you will almost never see a uniform split. A document with a scanned photo on page 3, vector diagrams on pages 11 through 14, and plain text everywhere else can have 80% of its bytes concentrated in fewer than 10% of its pages. This lopsided distribution is not a bug; it is a direct consequence of how the PDF format stores objects. Understanding that distribution before you compress or split a file is the difference between a well-targeted optimization and a blind operation that either misses the real bloat or breaks the document's logical structure.
How PDF Object Storage Creates Uneven Page Weight
A PDF file is a collection of numbered objects β streams, dictionaries, arrays, and references β assembled at write time and indexed in a cross-reference table near the end of the file. Each page is itself a dictionary object that references other objects: its content stream (the actual drawing instructions), its resource dictionary (fonts, color spaces, patterns), and any image XObjects embedded inline or shared across pages.
Image XObjects are the single largest driver of uneven file size. A 300 DPI scanned photograph stored as a JPEG stream might occupy 400 KB as a single object. If that image appears only on one page, that page's logical "weight" dwarfs every other page in the document. Conversely, a font program embedded once and shared across all pages gets counted against the file's overhead, not any individual page β which means per-page size estimates always carry some ambiguity about how to allocate shared resources.
Vector content (paths, curves, text drawn as outlines) is dramatically smaller than raster content at equivalent visual complexity. A full-page architectural drawing built entirely from Bezier paths might occupy 20 KB; a photograph of the same building at press resolution might occupy 4 MB. This 200x ratio explains why PDFs exported from design tools with mixed content behave so unpredictably when you try to split or compress them without first profiling the per-page weight.
The Three Categories of PDF Page Bloat
In practice, oversized pages in a PDF fall into three categories, each requiring a different fix:
Embedded high-resolution raster images. This is the most common case. A photographer's portfolio PDF, a scanned contract, or a presentation exported from PowerPoint with uncompressed screenshots. The fix is downsampling the images (reducing DPI from 300 to 150 for screen viewing) or switching to a more aggressive JPEG compression quality. Identifying which pages carry the large images before compressing lets you target the operation and verify the output.
Embedded font subsets that are disproportionately large. Some CJK (Chinese, Japanese, Korean) font subsets can run 2β5 MB per embedded font. A document that switches typefaces on a single decorative page β a title page with a custom display font β may carry most of its font overhead in that one page's associated resource dictionary. Splitting such a document without re-embedding fonts will leave the font data orphaned or duplicated.
Transparency and compositing groups. PDFs that use transparency blending (drop shadows, gradients over photographs, soft masks) generate additional soft-mask streams and form XObjects that add hidden bulk. These are invisible in the page count but appear as extra stream objects in the file. They are also the reason why "flatten transparency" is a common pre-press step β it eliminates these streams at the cost of rasterizing the affected regions.
What Page Dimensions Tell You About Compression Potential
A page's MediaBox dimensions (stored in points, where 72 points equal one inch) do not directly indicate file size, but they provide important context for interpreting size. A 210Γ297 mm (A4) page at 50 KB is almost certainly text-only or contains only vector graphics. The same dimensions at 2 MB almost certainly contains an embedded raster image, and that image is probably at a resolution far higher than the screen or printer requires.
Standard paper sizes appear in the PDF specification as exact point values: A4 is 595.28Γ841.89 pt, US Letter is 612Γ792 pt, A3 is 841.89Γ1190.55 pt. Pages that deviate from these standards β say, 794Γ1124 pt β are often the result of exporting from a tool that added margins differently, or scanning with a flatbed that captured a slightly non-standard crop. Non-standard dimensions matter when you are splitting a PDF and then printing: pages that are slightly oversized will be cropped or scaled by some printers in unexpected ways.
Landscape pages embedded in a portrait document are a common source of confusion when splitting. A 200-page report with three landscape pages (tables or charts rotated 90Β°) must be split with awareness of those rotations, or the resulting sub-files will have inconsistent reading orientations.
The Right Workflow: Analyze Before You Compress or Split
The correct sequence for any PDF optimization task is: profile first, act second, verify third. Profiling means knowing the page count, the dimensions of every page, and the approximate byte weight of each page before running any destructive operation.
For compression, the profile tells you which pages to focus the operation on. If pages 1, 2, and 4 each account for less than 1% of the file size but page 3 accounts for 60%, compressing the entire file with a uniform quality setting is wasteful β page 3 needs targeted image downsampling, and the other pages should be left untouched to preserve text sharpness.
For splitting, the profile tells you whether the split will produce balanced sub-files or wildly uneven ones. A 100-page PDF split at page 50 might produce a 2 MB first half and a 45 MB second half if all the high-resolution images happen to fall in the back matter. Knowing this in advance lets you choose a different split point or inform the recipient about the asymmetry.
For archiving, the dimension data tells you whether the PDF conforms to a standard (PDF/A requires specific constraints on embedded content) and whether any pages have non-standard sizes that might cause compliance issues.
How Browser-Based PDF Analysis Works Without a Server
The PDF format is a text-based object graph with binary streams. Reading the raw bytes of a PDF file in a browser β using the FileReader API to load the ArrayBuffer β gives you access to the same cross-reference table and object dictionary that a full PDF library would parse. From that raw text you can extract the /Count entry in the Pages dictionary (the authoritative page count), every /MediaBox array (page dimensions in points), and locate stream delimiters (stream/endstream keywords) to estimate per-stream byte sizes.
This approach works for the vast majority of PDFs produced by office tools, design applications, and scanners. The exception is PDF 1.5+ cross-reference streams β a binary compressed format for the xref table used by some modern generators. Files using compressed xref streams require a full inflate/deflate step to parse the object index, which is beyond what a lightweight regex-based parser handles. For those files, the tool will report the limitation clearly rather than silently producing incorrect results.
The key advantage of running this analysis entirely in the browser is privacy: your document's contents never leave your device. For legal contracts, financial statements, medical records, or any confidential document, this matters more than the marginal capability difference between a client-side and server-side parser.