Thursday, 3 April 2014

PDF To Text Converter

The PDFToTextConverter program can be used to convert a PDF file in to a text file. When the program runs you can selected one or many PDF files for converting. Then wait for a while. The amount of time waiting depends mainly on the number of PDF files you selected and each file size. It is noted that a PDF file that does not have text can not be converted.


PDF To Text Converter

PDFToTextConverter source code:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;

public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}


//allow pdf files selection for converting
public static void selectPDFFiles(){

JFileChooser chooser = new JFileChooser();
    FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
    chooser.setFileFilter(filter);
    chooser.setMultiSelectionEnabled(true);
    int returnVal = chooser.showOpenDialog(null);
    if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
            for( int i=0;i<Files.length;i++){    
            convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
            }
System.out.println("Conversion complete");
            }

     
}

public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();

}
bw.flush();
bw.close();



}catch(Exception e){e.printStackTrace();}

}

}

Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class.  This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.