Mihail Radu Solcan

  Home made RTF

2009-01-05

January 2007: the debate around the OLPC project has reached Romania. Teachers were also engaged in the debate. There were many questions, but I want to focus on one specific question: OLPC is problematic because the laptop cannot run Word (see, for example, this discussion among teachers). 

Do we really need Word or even OpenOffice for writing Word compatible documents? 

Let's examine the case of the essays of my students. These are academic essays on philosophical topics: no images, no tables etc. What is in there? Text organized into sections, footnotes. In fact, you do not need the whole machinery of Word or OpenOffice for producing simple, but useful electronic texts. 

What is really wrong with Word

Word and Open Office do not separate the content of the text and the layout. The idea is that you work on the text as it will look on the printed page. This is a very bad idea. It goes against the principle of the division of labor: you are both a writer and a typesetter. 

The printed word generates an illusion; everything that is printed seems to have some sort of authority. By making us write a text which looks as it is on the paper, Word and other editing software create a trap: our own words seem to have an aura. 

There is another illusion. The illusion that you can use a computer and be an absolute illiterate in programming. The illusion that you just have to know how to read, write and press buttons. 

Select and press this button! Your text will be … underlined. Indeed it will. But something happens behind the display. There is a language used by the program: somehow the selected text is bracketed and marked up in a way which tells the machine to display and print it underlined. 

Hiding the language is not a good idea. It cripples thinking. Let's say that I write by hand and I draw a line bellow a word. This is also a sign; it's part of the text. I want to say something when I underline the text. It is part of the content. 

Thus it makes sense to use not buttons, but a language for the description of the content. How the text will look when printed on paper is another matter and should not concern us as writers. 

RTF

Until not so long ago the files generated by default by Word had the extension doc. The doc files were mysterious, since no public document explained their structure. 

Recently, a new format has been adopted for Word. A debate raged around OOXML. 

Beyond all this, there is a simple problem. How you transfer documents between applications? Microsoft has an answer and this answer is public: RTF. 

According to Sean M. Burke, “RTF is a document format” (RTF Pocket Guide , p.1). On the cover, RTF is called “the Universal Document Format”. 

The acquisition of Burke's book has been one of my best investments. It did not cost much (even by Romanian standards) and it saved me the trouble of writing with WYSIWG editors like Word or OpenOffice. 

Sean M. Burke has a very useful web page on RTF. You will find there far better explanations of RTF than I could formulate. 

The meaning of the home made RTF

I do not have the intention to offer a tool for general, unrestricted use of LATEX . The subset of LATEX is a means for the generation of RTF, not an aim in itself. 

I have observed that in my profession people might write whole books that use a very limited number of constructions: titles for sections, emphasized text and footnotes, maybe some subscript or superscript. The real problem are rather unusual characters in philosophical words. 

I improvised the Python script while I was translating Schnädelbach's book on the theory of knowledge. Unlike my University, the publisher did not accept a pdf. The publisher wanted a file for Word. Thus I generated with ltx4rtf a rtf file that had in it just the content of the book. The publisher took care of the layout. 

You cannot use the script ltx4rtf for technical books or books with mathematical formulas. The script is in what programmers call the alpha stage. I would even say that in order to reach another stage, it needs a full revision of its basic strategy, something that I am not planning to do in the near future. 

I use the script mainly when I want to have both the possibility to generate a variety of files (dvi, pdf, djvu, odt, doc) and keep working without invoking WYSIWG monster tools. RTF is the key to such formats as odt, doc or even docx. 

How I type in Vim all those commands? I use a set of vim-menus. You must put the vim files from the archive in the plugin directory of your VIMHOME (see Vim documentation). 

The tools you'll need

In order to “cook” your own RTF, according to the following recipe you will need basically two programmer's tools. 

First, you need a programmer's editor. I tend to favor Vim. Any similar editor can be used, but with Vim comes usually xxd. You need

xxd

for the insertion of images in RTF. If you do not have this utility, you cannot use images. 

There will be probably a problem with xxd under Windows. You must modify the Python script where it calls xxd. 

Second, you need Python on your computer. This is not a problem. Python interpreters are available for GNU/Linux, Mac, Windows and other platforms. The OLPC's laptop has Python on it! 

You might need other tools too, but they are not essential for the present discussion. For example, you need a spell checker. GNU Aspell does an excellent job and works in text mode. 

You also need a viewer. Under GNU/Linux I use AbiWord or OpenOffice as viewers for RTF files. 

Microsoft offers a Word Viewer for free (for Windows systems). Unfortunately, only Word Viewer 2003 is now available for download. It cannot be installed on Win98 or older systems. 

Vim, Python and the viewer will not cost you anything. And certainly they will not put your computer in danger. 

The main idea

A quick glance at Sean Burke's book will show you that you cannot write RTF by hand. You can write a few short examples, but longer texts cannot be written manually. 

The clue of my recipe is to use a subset of LATEX and a Python script. For the Python script and the examples look to the downloads section

Please use an empty folder for your work. Use copies of the files! One never knows what might happen. In sum, there is no warranty of the recipe. You must take your responsibilities. And the recipe is under the GNU license. Use it if you like it, but do not hide the code in any form! 

A subset of LaTeX

LATEX is build on top of TEX and TEX is Turing complete

LATEX has been developed by a very active community. As for any other major programming language, there are a lot of packages. 

There is a project called latex2rtf. The main aim of the project is to offer a translator, a program which describes a document generated by a LATEX program in the RTF document format. The translator does not cover all the documents created using LATEX . 

My aim is even much more modest. The subset of LATEX is very limited. I call the script which generates the RTF “ltx4rtf”: LATEX for RTF, it is LATEX written with a special purpose in our mind. 

My practical aim was to write texts in LATEX that could be easily integrated, with few modifications, in other LATEX documents. Unlike latex2rtf, the Python script is easy to adapt to various special characters or constructions. When I need a certain Greek letter or some other sign, I just adapt the script. 

You will find in the 2009-01-05 archive two important files: skeleton.tex and ltx4rtf (the version from 2009-01-05). The first is what its name suggests. The second is the Python script. 

First, let's put something in the skeleton.tex. I renamed it tv.tex (always work with copies!). The content is in Romanian, but this does not matter. What follows is a description of the constructions illustrated in the tv.tex file. 

Put the name of the author between the braces of

\author{}

Put a title in

\title{}

Insert the date in

\date{}

Remember! It is a program. Any mistake will cause a failure. You must put the name, the title and everything else exactly at its place. 

Insert a blank line after

\begin{document} and  \maketitle{}

and write some text. 

Now, a new line does not mean a new paragraph. It is good practice to write every sentence of the paragraph on a different line. 

The signal for a new paragraph is a blank line. In fact, this is the command of the language that we use most often in ordinary texts. 

The backslash followed by verb means that the text between | | (the bars after the keyword verb) will be reproduced using Courier fonts. It is safer to write such a command on a separate line. Its use is much more limited than in LATEX . You cannot put LATEX commands between the bars of verb. In RTF, it is just a way of using Courier. 

A line which begins with a percentage sign is a comment. A comment is not shown in print or on the display. 

And now some other commands, as they appear in the demo file tv.tex from the 2009-01-05 archive:

The command \mbox{} means that the text between braces will be kept together. 

The next command is very important for academic writing: the backslash followed by the keyword footnote means that the text between the following braces is the text of a footnote. 

The content of the footnote must be inserted between the braces. Write the footnote on the same line. Use only one footnote per line. Remember that you can split the sentence on several lines. If you have problems with blank spaces use a backslash followed by a space and a pair of braces. The space after the backslash is significant! 

For the insertion of images follow the pattern from the tv.tex file in the 2009-01-05 archive. Insert the name of the figure without extension and use png images. Explain the content of the figure between the braces of the keyword caption

A note of caution must be added. The Python script works unmodified under GNU/Linux. As I have noted above, you need xxd for the images. Do not use images in your first tests with the script! 

Lists are an exception! In lists you can write after the braces of \item (see the file tv.tex in the archive for a practical illustration). 

Write the text you want to emphasize between a special pair of braces. The first brace is followed by \em and a space. 

The same explanation is valid for writing bold text. This time, the first brace is followed by \bf

However, in the case of \textsc and \textit you write between the braces which follow the keyword. The same is valid for the names of chapters and sections. 

The keyword \label marks up a location in the text towards which points a  \ref. Of course, you must put between the braces of these last two commands the same name (without spaces). 

Unlike the other commands, \href has two arguments. You must put a valid URL between the first pair of braces and a text between the second pair of braces. Write \href commands on different lines. 

The URL in \url will be just typed as it is (with Courier fonts). 

Any basic introduction to LATEX will explain this subset of the language. There are some restrictions, however, in ltx4rtf, in the case of \footnote and \href. Also, you cannot use nested constructions with the exception of the footnotes. You must however test the limits of the script, when you use some unusual construction. 

Use ltx4rtf

Let us suppose that the script ltx4rtf is in the current directory. Then, open a command-line in the current directory and use the command

./ltx4rtf tv.tex

to generate a rtf file named tv.rtf. 

I keep ltx4rtf in a folder named bin, which is declared in the PATH environment variable. In this way, you can type in any folder

ltx4rtf name-of-the-file.tex

Downloads

2009-01-05 archive

In the 2009-01-05 archive you find the script, the skeleton.tex and a series of examples. 

The file tv.rtf is generated from tv.tex by the Python script included in the archive. 

The file tv.dvi is generated from tv.tex with latex. The file tv.pdf is created from tv.tex with pdflatex. You must have a LATEX distribution on the computer in order to create them. 

The file tv.djvu is converted from tv.pdf with pdf2djvu

The files tv-ooo.odt, tv-ooo.doc and tv-ooo.pdf are generated with OpenOffice from tv.rtf. 

You need the dwn.eps image only for the dvi file.