Read pdf files with php


Question

I have a large PDF file that is a floor map for a building. It has layers for all the office furniture including text boxes of seat location.

My goal is to read this file with PHP, search the document for text layers, get their contents and coordinates in the file. This way I can map out seat locations -> x/y coordinates.

Is there any way to do this via PHP? (Or even Ruby or Python if that's what's necessary)

1
49
6/16/2009 11:56:46 PM

Accepted Answer

Check out FPDF (with FPDI):

http://www.fpdf.org/

http://www.setasign.de/products/pdf-php-solutions/fpdi/

These will let you open an pdf and add content to it in PHP. I'm guessing you can also use their functionality to search through the existing content for the values you need.

Another possible library is TCPDF: http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf

Update to add a more modern library: PDF Parser

31
6/24/2015 6:35:24 AM

There is a php library (pdfparser) that does exactly what you want.

project website

http://www.pdfparser.org/

github

https://github.com/smalot/pdfparser

Demo page/api

http://www.pdfparser.org/demo

After including pdfparser in your project you can get all text from mypdf.pdf like so:

<?php
$parser = new \installpath\PdfParser\Parser();
$pdf    = $parser->parseFile('mypdf.pdf');  
$text = $pdf->getText();
echo $text;//all text from mypdf.pdf

?>

Simular you can get the metadata from the pdf as wel as getting the pdf objects (for example images).


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon