Thursday, 3 April 2014

PDF To Text Converter

The PDFToTextConverter program can be used to convert a PDF file in to a text file. When the program runs you can selected one or many PDF files for converting. Then wait for a while. The amount of time waiting depends mainly on the number of PDF files you selected and each file size. It is noted that a PDF file that does not have text can not be converted.


PDF To Text Converter

PDFToTextConverter source code:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;

public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}


//allow pdf files selection for converting
public static void selectPDFFiles(){

JFileChooser chooser = new JFileChooser();
    FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
    chooser.setFileFilter(filter);
    chooser.setMultiSelectionEnabled(true);
    int returnVal = chooser.showOpenDialog(null);
    if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
            for( int i=0;i<Files.length;i++){    
            convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
            }
System.out.println("Conversion complete");
            }

     
}

public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();

}
bw.flush();
bw.close();



}catch(Exception e){e.printStackTrace();}

}

}

Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class.  This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.


Friday, 14 March 2014

Java Language Keywords

Here is a list of keywords in the Java programming language. You cannot use any of the following as identifiers in your programs. The keywords const and goto are reserved, even though they are not currently used. true, false, and null might seem like keywords, but they are actually literals; you cannot use them as identifiers in your programs.

abstractcontinuefornewswitch
assert***defaultgoto*packagesynchronized
booleandoifprivatethis
breakdoubleimplementsprotectedthrow
byteelseimportpublicthrows
caseenum****instanceofreturntransient
catchextendsintshorttry
charfinalinterfacestaticvoid
classfinallylongstrictfp**volatile
const*floatnativesuperwhile
*not used
**added in 1.2
***added in 1.4
****
added in 5.0