[reportlab-users] counting pages in PDF

Jerome Alet reportlab-users@reportlab.com
Fri, 18 Jun 2004 20:05:55 +0200


--fUYQa+Pmc3FrFX/N
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi,

some time ago, someone asked how to count pages in a PDF document.

I suggested using Ghostscript to convert PDF to DSC compliant PS and 
then using "grep -c %%Page:" 

This is slow.

Dinu gave some code but which worked only under MacOSX.

So for your pleasure, here's some code which seems to work with all 
the PDF documents I've tested, and which should be completely cross 
platform, provided you use Python 2.3 or newer : it uses the 
Universal line end opening mode which appeared in 2.3 

Comments are welcome.

NB : this code is licensed under the terms of the GNU GPL.
This is not public domain.

Jerome Alet

--fUYQa+Pmc3FrFX/N
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=pdfcount

#! /usr/bin/env python
# -*- coding: ISO-8859-15 -*-
# 
# pdfcount - a fast PDF page counter
# (c) 2004 Jerome Alet <alet@librelogiciel.com>
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
#
# 

import sys

def pdfcount(infile) :
    """Counts pages in PDF documents."""
    pagecount = 0
    content = []
    while 1 :     
        line = infile.readline()
        if not line :
            break
        line = line.strip()
        content.append(line)
        if line.endswith("endobj") :
            pagecount += " /".join([x.strip() for x in " ".join(content).split("/")]).count(" /Type /Page ")
            content = []
    return pagecount    
    
if __name__ == "__main__" :    
    inputfile = open(sys.argv[1], "rbU")
    count = pdfcount(inputfile)
    inputfile.close()
    print "%s size is %i pages" % (sys.argv[1], count)

--fUYQa+Pmc3FrFX/N--