Mediawiki2latex

Draft

Introduction

Mediwiki2latex or wb2pdf is a tool created by Dirk Huenniger that allows exporting Mediawiki pages and article collections to create Latex, PDF, Epub and ODF.

It can be used to create (1) print documents on demand and (2) export for book projects that start from wiki pages as draft documents.

See also:

Disclaimer. This is not an official documentation page. Also, prior to feb 10 2019, this toolset was designed to work with standard installations, i.e. not our type of mediawikis. A nice and quick fix now allows to create PDFs from book collections on demand. Other functionality may be implemented at a later stage.

This page explains how to use mediawiki2latex and how to install it on an Ubuntu system.

Using

Mediawiki2latex works best with Wikipedia. As of Feb 13, certain functionality does not work with this wiki, but may work on others. Below we introduce the options you have for using this platform. Command line is probably the most productive option.

Official online server

The official online server allows processing books within limits, so we recommend installing your own platform if you got a Debian/Ubuntu machine.

Using your own is faster and will take take load off the official server.

Your own local server

You could run your own server, either as public or local server.

mediawiki2latex -s PORT_NUMBER
e.g.
mediawiki2latex -s 8080

Command line

Again, some of these may not work with your wiki. Some combinations of parameters do not work, e.g. one cannot combine "bookmode" and "user templates".

See also: official manual. It it includes more information.

-V, -?, -v    --version, --help     show version number
  -o FILE       --output=FILE         output FILE (REQUIRED)
  -f START:END  --featured=START:END  run selftest on featured article numbers from START to END
  -x CONFIG     --hex=CONFIG          hex encoded full configuration for run
  -s PORT       --server=PORT         run in server mode listen on the given port
  -t FILE       --templates=FILE      user template map FILE
  -r INTEGER    --resolution=INTEGER  maximum image resolution in dpi INTEGER
  -u URL        --url=URL             input URL (REQUIRED)
  -p PAPER      --paper=PAPER         paper size, one of A4,A5,B5,letter,legal,executive
  -m            --mediawiki           use MediaWiki to expand templates
  -h            --html                use MediaWiki generated html as input (default)
  -e            --tableslatex         use LaTeX to gernerate tables
  -n            --noparent            only include urls which a children of start url
  -k            --bookmode            use book-namespace mode for expansion
  -z            --zip                 output zip archive of latex source
  -b            --epub                output epub file
  -d            --odt                 output odt file
  -g            --vector              keep vector graphics in vector form
  -i            --internal            use internal template definitions
  -l DIRECTORY  --headers=DIRECTORY   use user supplied latex headers
  -c DIRECTORY  --copy=DIRECTORY      copy LaTeX tree to DIRECTORY

Example code for books (replace URL by your own)

  • Generate a PDF from a wiki book ("Collection extension), starting from the HTML code
mediawiki2latex -o book.pdf -k -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Book_title
with some more parameters
mediawiki2latex -o book.pdf -k -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Broderie_num%C3%A9rique -c livrelatex -t /usr/share/mediawiki2latex/latex/templates.user -r 250 -p A4
  • Generate a PDF from a wikibook, starting with wiki code and use templates.

mediawiki2latex -o book.pdf -m -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Broderie_num%C3%A9rique -c livrelatex -t /usr/share/mediawiki2latex/latex/templates.user

  • Create a Libre Office document from a collection (see comments below with respect to LibreOffice)
mediawiki2latex -o book.odf -k -d -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/XXX_YYY -c .
  • Create a zip file with latex and assets
mediawiki2latex -o book.zip -k -z -u https://edutechwiki.unige.ch/fr/BookNS:Books/Book_title

Example code for articles

  • Create a page using wiki expansion (not working as of Feb 11 2019 in this wiki)
mediawiki2latex -o article.pdf -m -u "https://edutechwiki.unige.ch/fr/STIC:STIC_III_(2018)/Prototypes_de_physicalisation_-_broderie_machine"
  • Create a page using internally defined templates (using the -t option specifying a template file)
mediawiki2latex -o article.pdf -u "https://edutechwiki.unige.ch/fr/STIC:STIC_III_(2018)/Prototypes_de_physicalisation_-_broderie_machine" -t /usr/share/mediawiki2latex/latex/templates.user

Template Tweaking

The easiest way is to use HTML mode since templates will be expanded. However you then may get unwanted contents. Therefore you could retrieve in wiki mode, but you then have to define latex templates. See the official documentation

If you set $wgDefaultUserOptions['numberheadings'] = 1; in LocalSettings, remove it temporarily while mediawiki2latex downloads the articles. Alternatively, use wiki mode, if it works in your wiki.

Reduce image size to 400px. (I have to test if this works with thumbnails).

Exclude all templates you don't want, by editing /usr/share/mediawiki2latex/latex/templates.user , copy the file, then use the "-t" option since the templates.user file is not read automatically by the system.

E.g. for starters, kill unwanted navigation widgets, like the following in our wiki

["tutorial","LaTeXNullTemplate"],
["tutoriel","LaTeXNullTemplate"],
["syllabus","LaTeXNullTemplate"]

If you want your own templates. You need to define three things:

  • A Mediawiki template
  • A line in the templates.user file
  • A latex template definition in file headers/templates.tex

Below is an example: Wrapping of images and size reduction (width is a fraction of line width)

Mediawiki template:

<!--
Modèle utilisé pour wrapfigure avec mediawiki2latex.
Affiche l'image dans le wiki, mais insère une commande LaTeX dans l'export PDF.
Ne pas modifier sans comprendre l'effet sur l'export.
-->
<noinclude>
== Usage ==
<nowiki>
{{Latex Wrapfigure
 |pos=r
 |width=0.5
 |image=[[image:myimage.jpb|300px|right|.....]]
}}
</nowiki>

The arguments are totally ignored by the Wiki, except for the definition of the image, the only item passed to the wiki.

[[Category:Modèles]]
</noinclude><includeonly>{{{image}}}</includeonly>

Template user file entry:

.....,
["Latex Wrapfigure","Mywrapfigure","pos","width","image"],
["Latex Makebox","Mymakebox","width","content"],

Latex template:

\usepackage{wrapfig}
\DeclareRobustCommand{\Mywrapfigure}[3]{%
  \begin{wrapfigure}{#1}{#2\textwidth}
  \centering
  \leavevmode
  #3
  \end{wrapfigure}
}

Copyright information / header

You can define your own headers by modifying and recompiling

document/headers/options.tex

Else use the --headers option.

.... not tested so far.

Running the latex code manually

There are 2 reasons for running the latex code manually: Either you want to edit latex as opposed to the wiki code or you need to run it in order to debug.

  • Generate the Latex files with the "-c" flag, e.g.

mediawiki2latex -o book.pdf -k -u ...-c wikibook -t .... -r 250 -p A4

  • Run it like this from wikibook/document/main directory

lualatex --interaction=nonstopmode main.tex

Creating wiki books

Using transclusion

You could create a wiki page that includes other articles. However, there will be a processing limit. E.g. If you include dozens of pages you may experience slow down or exceed max. number of templates allowed. However, as of April 2019, you must use this to create books with template expansion (-m or -t flag).

= title 1 =
{{:MyPageOne}}
= title 2 =
{{:MyPageTwo}}

Example using template expansion

 mediawiki2latex -o book.pdf -u  https://edutechwiki.unige.ch/fr/Daniel_K._Schneider/My_Book -t /usr/share/mediawiki2latex/latex/templates.user

Make sure that included pages start title numbering with "==" and not "=".

Using collection extension

The recommended solution is to use wiki books defined by the collection extension, i.e. use a feature from the alternative PediaPress technology.

As of Feb 2019, this works with our wikis using default mode (html-based). It fails using "wiki" template expansion.

Mediawiki Design Principles

Mediawiki2latex requires some design rules for mediawiki code:

Tables

  • Don't leave emtpy cells in tables, use for example a "-"
  • To force use of smaller fonts you can add latexfontsize="scriptsize" into the header of the table.

Pictures

  • Pictures that are equal or larger than 400px will use the full page width
  • Gallery code : Does not work well, I suggest replacing it by tables. ChatGPT does conversion.

Installation of Latex

It is likely that your current latex installation is not good enough. Therefore we suggest installing the latest version of Tex Live manually. It can be installed over an existing system and co-exist. Tex Live should include important packages like xetex.

  1. cd /src # working directory of your choice
  2. Download: wget https://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz or https://www.tug.org/texlive/acquire-netinstall.html
  3. zcat < install-tl-unx.tar.gz | tar xf - # note final - on that command line
  4. cd install-tl-2*
  5. perl ./install-tl --no-interaction # as root or with writable destination # may take several hours to run
  6. Finally, prepend /usr/local/texlive/YYYY/bin/PLATFORM to your PATH, e.g., /usr/local/texlive/2025/bin/x86_64-linuxi.e. Edit ~/.profile and set PATH=/usr/local/texlive/2025/bin/x86_64-linux:$PATH

Here is the short version I used

wget https://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz
tar -xzf install-tl-unx.tar.gz
cd install-tl-*/
sudo ./install-tl

Edit the .profile file: PATH=/usr/local/texlive/2025/bin/x86_64-linux:$PATH

Test the installation:

latex small2e

Alternatively you could just install the system's package, however, depending on the age of your OS it may be outdated.

sudo apt-get install texlive-xetex

You also could consider installing a latex editor:

Installation for Ubuntu

There exist various options. According to our own experience (July 2025) it is best to do manually a Haskell installation.

Standard Ubuntu package

Will give an outdated version, that has some important features missing.

sudo apt-get install mediawiki2latex

if needed, also install image magick. Type mediawiki2latex to see if it's there.

sudo apt-get install imagemagick
mediawiki2latex

This worked for Ubuntu 24.04.2 LTS (better than the manual installation below)

Then, do the configuration as explained further below, in particular, Latex needs memory

Standard Installation under WSL

(Windows for Ubuntu)

The installation is the same as for Ubuntu. Will give an outdated version, that has some important features missing.

sudo apt-get install mediawiki2latex

Then, do the configuration as explained further below, in particular, Latex needs memory

Finally, make sure to close the terminal and open it again in order to get the PATH right (or else load the ini file)

Installation of the latest Debian package

Debian packages are newer than Ubuntu ones. However, installing a debian package might be a bad idea, since it could create conflicts with ubuntu packages.

Installation steps Thanks to ChatGPT.

  1. 0 add keys if #1 fails with missing keys

for key in 0E98404D386FA1D9 6ED0E7B82643E131 F8D2585B8783D481; do gpg --keyserver keyserver.ubuntu.com --recv-keys $key && gpg --export $key | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/$key.gpg; done

  1. 1. Add bookworm repo (won’t affect full system)
echo "deb http://deb.debian.org/debian bookworm main" | sudo tee /etc/apt/sources.list.d/bookworm.list
  1. 2. Update APT
sudo apt update
  1. 3. Install mediawiki2latex *only* from bookworm
sudo apt install -t bookworm mediawiki2latex
  1. 4. (Optional) Prevent upgrades/downgrades
echo "mediawiki2latex hold" | sudo dpkg --set-selections

Installation of the cutting edge Cabal version

We recommend installing this (or similar)

Cabal is the Haskell package installer, similar what pip is to python or npm to javascript Node.js

The scripts below may require additional installations

  • A recent curl
cd /tmp
wget https://curl.se/download/curl-8.7.1.tar.gz
tar xzf curl-8.7.1.tar.gz
cd curl-8.7.1
./configure --prefix=$HOME/.local
make -j$(nproc)
make install
  • openssl-dev
sudo apt install libssl-dev

Below is one script produced by ChatGTP and that did work

#!/bin/bash

set -e

echo "==> Installing system dependencies..."
sudo apt update
sudo apt install -y ghc cabal-install wget tar make \
    libghc-pandoc-dev libghc-tagsoup-dev libghc-http-conduit-dev libghc-blaze-html-dev

echo "==> Updating Cabal..."
cabal update

echo "==> Downloading mediawiki2latex 8.28..."
wget -O mediawiki2latex-8.28.tar.gz https://sourceforge.net/projects/wb2pdf/files/mediawiki2latex/8.28/mediawiki2latex-8.28.tar.gz/download

echo "==> Extracting source..."
tar xzf mediawiki2latex-8.28.tar.gz
cd mediawiki2latex-8.28

echo "==> Building mediawiki2latex..."
cabal v2-build

echo "==> Installing to ~/.local/bin ..."
cabal v2-install --installdir ~/.local/bin --overwrite-policy=always

echo "==> Build complete."

# Check if ~/.local/bin is in PATH
if ! echo "$PATH" | grep -q "$HOME/.local/bin"; then
  echo "==> Adding ~/.local/bin to PATH in ~/.bashrc"
  echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
  source ~/.bashrc
fi

echo "==> Installed mediawiki2latex version:"
~/.local/bin/mediawiki2latex --version

echo "==> Testing for --cachedir and --topheading support:"
~/.local/bin/mediawiki2latex --help | grep -E 'cachedir|topheading' || echo "❌ Options not found"

Here is an alternative method that I finally used

git clone git://git.code.sf.net/p/wb2pdf/git mediawiki2latex-sf
cd mediawiki2latex-sf

# Install build tools (once)
sudo apt update && sudo apt install -y ghc cabal-install

# Build and install
cd mediawiki2latex-sf
cabal update
cabal install --installdir ~/.local/bin --overwrite-policy=always

Manual install under Ubuntu using packages plus a new version on top

See the official links below first. We did use this method on Feb 2019 for Ubuntu 18x LTS.

(1) Install the (probably old) default version, which will also install lots of run time dependencies (compatible with your current Ubuntu system).

sudo apt-get install mediawiki2latex

(2) Then install the build time dependencies (as root) as explained [Benutzer:Dirk Hünniger/wb2pdf/installing here] , i.e. about 10 different packages

 apt-get install ghc libghc-x509-dev libghc-pem-dev chromium chromium-sandbox
 apt-get install libghc-regex-compat-dev libghc-http-dev cabal-install libghc-hxt-dev
 apt-get install libghc-split-dev libghc-blaze-html-dev libghc-file-embed-dev
 apt-get install libghc-hxt-http-dev
 apt-get install libghc-temporary-dev libghc-url-dev libghc-utf8-string-dev
 apt-get install libghc-utility-ht-dev libghc-http-client-tls-dev libghc-happstack-server-dev
 apt-get install libghc-directory-tree-dev libghc-zip-archive-dev libghc-strict-dev
 apt-get install libghc-network-uri-dev libghc-tagsoup-dev libghc-word8-dev
 apt-get install ghostscript make latex2rtf libreoffice curl texlive-extra-utils
 apt-get install pdftk libimage-exiftool-perl

(3) Then install the new version from the git repository

git clone https://git.code.sf.net/p/wb2pdf/git wb2pdf-git
cd wb2pdf-git
sudo make
sudo make install

To update:

  • cd into the wb2pdf-git directory
sudo git pull
sudo make install

Important: Do this, each time you update your latex installation.

Mediawiki2Latex configuration

(1) Add a list of templates you want the system to ignore

  • Edit the config file:
/usr/share/mediawiki2latex/latex/templates.user
  • Or copy it and then use it with the "-t" option.

I, for example, had to add the following. Make sure to respect syntax, e.g. no comma in the last line or else the program will just fade out ...

.....
["tutoriel","LaTeXNullTemplate"],
["brouillon","LaTeXNullTemplate"],
["ebauche","LaTeXNullTemplate"],
["incomplet","LaTeXNullTemplate"],
["citation","LaTeXZeroBoxTemplate","1"],
["lien","LaTeXZeroBoxTemplate","1"]
]

(2) Add fonts if necessary (highly likely)

sudo dpkg -i fonts-unifont_15.1.01-1_all.deb

  • Install also the following fonts in the system

sudo apt-get install fonts-cmu

(3) Install imagemagick if it's not already in the system

sudo apt install imagemagick

Then adapt permissions and processing parameters as root in /etc/ImageMagick-6/policy.xml

<policy domain="coder" rights="read|write" pattern="PS" />
<policy domain="coder" rights="read|write" pattern="PS2" />
<policy domain="coder" rights="read|write" pattern="PS3" />
<policy domain="coder" rights="read|write" pattern="EPS" />
<policy domain="coder" rights="read|write" pattern="PDF" />
<policy domain="coder" rights="read|write" pattern="XPS" /> 
<policy domain="resource" name="memory" value="8GiB"/>
<policy domain="resource" name="map" value="8GiB"/>
<policy domain="resource" name="width" value="100KP"/>
<policy domain="resource" name="height" value="100KP"/>
<policy domain="resource" name="area" value="10GP"/>
<policy domain="resource" name="disk" value="20GiB"/>

(7) Increase Latex buffer size

It is very difficult to find correct information on how to do this

  • Find config file : kpsewhich texmf.cnf
  • Edit this file and add buf_size=10000000
Max size is supposed to be 12435455

Some instructions in the official mediawiki2latex installation manual seem to be outdated as of June 2025.

I also played with other parameters, but cannot remember if they are needed.

buf_size=10000000
main_memory=20000000
pool_size=20000000
main_memory.xetex = 20000000
extra_mem_top.xetex = 1000000
extra_mem_bot.xetex = 1000000
main_memory.luatex = 20000000
extra_mem_top.luatex = 1000000
extra_mem_bot.luatex = 1000000

Troubleshooting

Stopping without any message in the beginning

Make sure you configured enough memory.

Also, for some strange reason, you can just try to repeat the command.

mediawiki2latex: Prelude.read: no parse

Something wrong in the template file. E.g. missing quote or a trailing comma after the last element.

Missing pictures will stop the program

If you link to an nonexisting picture, the program will crash (June 2025).

Here is an example. First line is OK, after second line it stopped.

MediaWiki2LaTeX-tmp-d95fda37057d5e71/document/images/321.jpg JPEG 2252x2316 2252x2316+0+0 8-bit sRGB 1.2829MiB 0.050u 0:00.062
mediawiki2latex: /tmp/MediaWiki2LaTeXImages-a2e64d0a3634b499/322: withBinaryFile: does not exist (No such file or directory)

In order to figure out where this picture is:

  • open /tmp/MediaWiki2LaTeXImages-a2e64d0a3634b499/321 and /tmp/MediaWiki2LaTeXImages-a2e64d0a3634b499/323. The missing one is in between.

Empty pictures will stop the program

Typically, pictures included from WikiMedia Commons could be empty. So make them local.

To debug picture problems:

 lualatex -output-driver="xdvipdfmx -vv" main.tex

Missing chapter headings

  • In principle, each mediawiki article should become a chapter in book mode. In the version 8.7 this does not happen

Workaround 1

  • Insert \chapter {} in the latex file and rerun it again at least twice with xelatex main.tex but this is really unpractical

A variant of this is to to this automatically. Here is a chatgpt script I still have to test. It makes the assumption that each page starts with ==Introduction==

#!/usr/bin/env python3
import sys
import requests
import re

def fetch_book_titles(bookpage_title):
    url = "https://edutechwiki.unige.ch/fr/api.php"
    params = {
        "action": "parse",
        "page": bookpage_title,
        "prop": "links",
        "format": "json"
    }

    r = requests.get(url, params=params)
    r.raise_for_status()
    data = r.json()

    titles = []
    for link in data.get("parse", {}).get("links", []):
        if not link.get("*").startswith("EduTech Wiki:") and not link.get("exists") is False:
            titles.append(link["*"])
    return titles

def inject_chapters_into_tex(titles, tex_input, tex_output):
    with open(tex_input, encoding='utf-8') as f:
        lines = f.readlines()

    out_lines = []
    t_idx = 0
    for line in lines:
        if line.strip() == r'\section{Introduction}' and t_idx < len(titles):
            out_lines.append(f'\\chapter{{{titles[t_idx]}}}\n')
            t_idx += 1
        out_lines.append(line)

    with open(tex_output, 'w', encoding='utf-8') as f:
        f.writelines(out_lines)

    print(f"✔️ Injected {t_idx} chapter titles into {tex_output}")

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: script.py input.tex output.tex")
        sys.exit(1)

    book_page = "EduTech_Wiki:Livres/Broderie_numérique"
    titles = fetch_book_titles(book_page)
    inject_chapters_into_tex(titles, sys.argv[1], sys.argv[2])

Workaround 2

  • Insert a level 1 heading in each article within "includeonly" tags

Example:

 <includeonly>
 =Broderie machine=
 </includeonly>

Tables look bad

Whitespaces

Remove all white spaces after "|" or "||". That improves some problems.

Rending method

Otherwise one can try "Chromium" tables in the media2latex options or change text. Under my WSL installation Chromium tables do not seem to work, at least with the -t flag

Tables with pictures

  • only put one picture in one cell
  • do not use row spans, the table must remain simple.

Some lines appear not as next line but are appended to an existing line

  • This is a weird bug and is related to cell contents :(
  • If you got a text that is formula, e.g. 2*112.5= 225 the table will break, i.e. append lines. "=" must have spaces before and after, like so:" = ".

Debugging

You can ask mediawikitolatex to copy all the latex file into a directory and then look at the Latex code

  • Use the -c option

To run the tex file manually, try cd document/main, then (if we are correct) lualatex main.tex (maybe xelatex ?)

Memory errors are probably related to mistakes in the latex code, i.e. it enters some kind of infinite loops.

Libre office installation and creation tips

As of Jan 7 2020:

(1) Get the latest libre office, read https://wiki.ubuntu.com/LibreOffice

sudo apt install python-software-properties
sudo apt-add-repository ppa:libreoffice/ppa
sudo apt update
sudo apt install libreoffice
$ libreoffice --version
 LibreOffice 6.3.4.2 30(Build:2)

(2) Make sure that imagemagik has permission to transform PS and PDF files to PNG

In /etc/ImageMagick-6/policy.xml

 <policy domain="coder" rights="read|write" pattern="PS" />
 <policy domain="coder" rights="none|write" pattern="PS2" />
 <policy domain="coder" rights="none|write" pattern="PS3" />
 <policy domain="coder" rights="none|write" pattern="EPS" />
 <policy domain="coder" rights="read|write" pattern="PDF" />
 <policy domain="coder" rights="read|write" pattern="XPS" /> 

(3) (Fixed) In an older than Jan 13 2020 version, the ODF could not find the image files, but that is fixed now. I did the following ativate "copy to latex", then make sure that LibreOffice can find the images and formulas directories it is looking for, e.g. if you start from Pediapress book definition:

mkdir somedirectory
cd somedirectory
mediawiki2latex -o ct.odf -d -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Initiation_%C3%A0_la_pens%C3%A9e_computationnelle_avec_JavaScript -k -c .

then

mv document/images/ . 
mv document/formulas/ .

Links