Transition from LaTeX to R
markdown syntax often has multiple ways of describing article
components. For example figure environments in R markdown, has multiple
ways to include a figure like, firstly there is simple markdown
structure, then one can use knitr::include_graphics()
within a code chunk to include images or if you are targetting HTML web
formats then you can also choose to include images using
<img .. />
tag as well. Similar variations exist in
LaTeX as well and that is why understanding how the texor
package will transform these environments and which options are
available for you to choose is important.
Images are an essential component in any article, However due to the differences in the support for various graphic formats between LaTeX and markdown/HTML we need to fallback on raster graphics.
Graphics.Format | LaTeX | Markdown | RMarkdown | HTML |
---|---|---|---|---|
PNG | ✓ | ✓ | ✓ | ✓ |
JPG | ✓ | ✓ | ✓ | ✓ |
✓ | ✗ | ✗ | ✗ | |
SVG | ✗ | ✓ | ✓ | ✓ |
Tikz | ✓ | ✗ | ✓(using tikz engine) | ✗ |
Algorithm | ✓ | ✗ | ✗ | ✗ |
As we can observe from the above table, raster (PNG and JPG) images are relatively easy to handle and do not require any additional pre-processing. Pandoc handles it well.
PNG,JPG images are easily included as markdown images with captions, labels and other parameters.
For other image formats like tikz and algorithm, they are first isolated and compiled into a PDF, then to a PNG using pdftools package.
\includegraphics{}
. This might interfere with the packages
ability to properly read the path and copy the relevant image
properly.In newer versions of texor
package, a new option is
available to convert figures into Rmarkdown style code chunks to include
figures there. If the option fig_in_r
is set TRUE, a custom
Lua Filter will transform the image structure.
For example :
\begin{figure}
\includegraphics{image/sample.png}
\caption{This is a sample caption}
\labeL{fig:image-1}
\end{figure}
will be converted to
```{r , echo=FALSE , fig.cap="This is a sample caption", fig.alt="graphic without alt text",
fig.show='hold', fig.align="center", out.width="100%"}
knitr::include_graphics(c("image/sample.png"))
```
if the option fig_in_r
is set FALSE, a simple markdown
figure would be generated
texor
package is to generate R
markdown style code chunks to include images using
knitr::include_graphics(c(file_path))
. This mechanism was
added using a unique Lua filter 1 which can also handle multiple images in a
single figure environment as well. The Lua filter is open source and
available under MIT License here.fig_in_r = FALSE
in
rnw_to_rmd()
or latex_to_web()
functions, you
can revert back to simple markdown figures which has the following
syntax ![caption](file_path){#identifier ..}
.texor
package transforms the block into a raw HTML block so
as to enable such out of the box use of figure environment in
markdown.fig_in_r = FALSE
, the captions added to
the figures will be appended with figure numbering which would also be
reflected in the references to the figures throughout the document as
well. If you want to add or change the position of the position of the
figures exercise caution.Tables are commonly used in articles to display data in a tabular format. However, there are differences in the way tables are handled by LaTeX and HTML.
LaTeX tables have more customization and are usually optimized for printing, whereas the web articles need tables optimized for varying sizes of media.
pandoc converts most of the tables somewhat easily, but is unable to do well with table customization packages and complex tables.
Some pandoc extensions are used in order to tackle them, they are :
simple_tables
, pipe_tables
Limited Multicolumn support is included2.
\begin{table}[t!]
\begin{tabular}{l | llll }
\hline
EXAMPLE & $X$ & & $Y$ & \\
\hline
& 1 & 2 & 1 & 2 \\
EX1 & X11 & X12 & Y11 & Y12 \\
EX2 & X21 & X22 & Y21 & Y22 \\
EX3 & X31 & X32 & Y31 & Y32 \\
EX4 & X41 & X42 & Y41 & Y42\\
EX5 & X51 & X52 & Y51 & Y52 \\
\hline
\end{tabular}
\label{table1} \caption{An Example Table}
\end{table}
In newer versions3 of the texor
package, the
default tables generated are included using kable()
function. The data is stored within a csv file and has support for
multiple data/object types. However there are some limitations with
handling a wide variety of tables available in LaTeX. Hence, in case the
texor
package recognizes some complex table, in order to
retain the table structure and preserve the data in the article, the
package will fallback to simpler markdown tables for the rest of the
document.
::: {#table1}
```{r table-1, echo = FALSE, results = 'asis'}
table_1_data <- read.csv("table_data_1.csv")
knitr::kable(table_1_data, caption="An Example Table")
```
:::
with the data tucked away in a csv file.
EXAMPLE ,$X$ , ,$Y$ ,
,1 ,2 ,1 ,2
EX1 ,X11 ,X12 ,Y11 ,Y12
EX2 ,X21 ,X22 ,Y21 ,Y22
EX3 ,X31 ,X32 ,Y31 ,Y32
EX4 ,X41 ,X42 ,Y41 ,Y42
EX5 ,X51 ,X52 ,Y51 ,Y52
However, if you choose to use the option
kable_tab = FALSE
the resulting markdown generated would
look like
::: {#table1}
-------------------------------------
EXAMPLE \(X\) \(Y\)
--------- ------- ----- ------- -----
1 2 1 2
EX1 X11 X12 Y11 Y12
EX2 X21 X22 Y21 Y22
EX3 X31 X32 Y31 Y32
EX4 X41 X42 Y41 Y42
EX5 X51 X52 Y51 Y52
-------------------------------------
: Table 1: An Example Table
:::
kable()
tables by
default, unless some complex LaTeX table is identified, beyond which the
package will fallback to simpler markdown tables.CodeBlocks
inside
of tables exists, it is not the best and can have unintended side
effects.Pandoc naturally converts verbatim environment easily, however the
redefinition of other commands such as example
,
example*
, Sinput
etc to verbatim does not work
well in pandoc.
Hence texor
package uses the stream editor to search
find and replace matching code environments to verbatim before pandoc
touches it.
This way the the code is not lost in conversion, also a pandoc
extension is used to add attributes to the markdown code using
fenced_code_attributes
example | S.series | special.verbatim |
---|---|---|
example, example* | Sin, Sout, Scode,Sinput,Soutput | smallverbatim, boxedverbatim |
With updates to the texor
package we were able to add
support for Sweave articles which included retaining the code chunks
with parameters. This functionality is not found natively in LaTeX,
hence unsupported in pandoc as well. To add this feature, we designed a
custom pandoc reader in Lua, which uses LPEG expressions to separate out
the code chunks from the Sweave document, while retaining the parameters
and code.
The above Sweave code block would be converted to
Similarly inline code express in Sweave defined by
\Sexpr{'0.6'}
will be transformed to
` r '0.6'`
Math typesetting has always been LaTeX’s highlight feature, making it a de facto choice among academicians and researchers globally. However, as we proceed to our humble web interfaces, math is hard to describe traditionally. There have been advancements in JavaScript libraries to better Typeset and present math in web pages but not all LaTeX commands/math functions are available.
The texor package uses Mathjax version 3 to enhance the visual look of the math content in HTML. There is support for equations, inline math, and equation numbering.
In bookdown, you do not get automatic numbering of un-labelled
equations. This can be tricky to deal with if we want to have the
equation numbering matching with the one in LaTeX. To circumvent this
issue we first transform the equation labels both in the equations and
references, to maintain support with the specifications prescribed in
bookdown. Then we transform the labels from LaTeX label in equations
from \label{..}
to (\#eq:..)
and references
from \ref{..}
to \@ref(eq:..)
which ensures
compatibility. Next for equations without labels and no
\nonumber
commands, we automatically assign and add a
equation label like (\#eq:autonumber..)
where the equation
number is the last character.
To enable this you need to set the option autonumber_eq
as TRUE
while converting the documents.
The aim of the texor
package was to convert the LaTeX
source code to a R markdown file which could be then knitted into
different web formats.
For R journal articles, we prefer to use the template from
rjtools
package
rjtools::rjournal_web_article
.
For Sweave articles however, we have opened up options for vignette styles.
bookdown
: The most common choice for
output_format
is bookdown
option, this will
use bookdown::html_document2
for the vignette.litedown
: For a lightweight html vignette, set the
option output_format
to litedown
, which will
use litedown::html_format
for the vignette.biocstyle
: Commonly used among bioconductor vignettes,
set the option output_format
to biocstyle
which will use BiocStyle::html_document
for the
vignette.litedown
as of now as it is an experimental package.bookdown
option includes an header file with custom
JavaScript to incorporate section numbering and normal equation
numbering (i.e. eq 1,2,3,.. instead of 1.1,1.2,2.1,3.1,..), this will be
included when autonumber_sec
is set to
TRUE
.bookdown
option will set the math_mode to
katex
by default unless autonumber_eq
option
is set to TRUE
, where the math_mode is
mathjax
.