Convert a pdf to a text file from the command line with pdftotext
To convert a pdf to a text file in the shell (terminal) run:
$ pdftotext options pdf_file text_file
pdf_file
is the pdf file you want to convert to a text file.
text_file
is the name of a text file that will be created as output of the command.
options
are specifications you want to instruct pdftotext. For example, you can tell pdftotext to only convert a pdf to text from page 2 to page 6 by running:
$ pdftotext -f 2 -l 6 pdf_file text_file
-f
specifies the first page to convert from and -l
specifies the last page to convert to.
To learn more about pdftotext run:
$ man pdftotext
There might be dozens of other pdf utilities like this on your system. Run $ apropos pdf
to list them (learn more about apropos).
pdftotext is copyleft-licensed and was first released in 1995. It was written, and is still developed, by Derek Noonburg1.
xpdfreader’s website.↩︎
personal computing wiki command-line interface (cli) gnu linux trisquel pdf office applications text processing shell literacy offline