July 25, 2023

Convert a pdf to a text file from the command line

To convert a pdf to a text file in the shell (terminal) run:

$ pdftotext options pdf_file text_file 

pdf_file is the pdf file you want to convert to a text file.

text_file is the name of a text file that will be created as output of the command.

options are specifications you want to instruct pdftotext. For example, you can tell pdftotext to only convert a pdf to text from page 2 to page 6 by running:

$ pdftotext -f 2 -l 6 pdf_file text_file

-f specifies the first page to convert from and -l specifies the last page to convert to.

To learn more about pdftotext run:

$ man pdftotext

There might be dozens of other pdf utilities like this on your system. Run $ apropos pdf to list them (learn more about apropos).

pdftotext is copyleft-licensed and was first released in 1995. It was written, and is still developed, by Derek Noonburg1.

  1. xpdfreader’s website.↩︎

personal computing wiki command-line interface (cli) gnu linux trisquel pdf office applications text processing shell literacy

