PDF Export for selfhosted outline instance
Intro
TLDR: I want to export an Outline collection and will try around with Obsidian later.
Why Outline?
Outline is a clean and powerful knowledge base solution that fits some needs better than obsidian. Don’t get me wrong - Obsidian is an excellent tool as well, and both have their strengths.
Why Obsidian is great
Your notes are just markdown files files. Even if you stop using Obsidian (or if it stops being free), your data remains as readable markdown files and accessible in any markdown editor.
It just feels like there is a lot of thought put into the app and it is actively developed. There are many features that just feel right, like the link graph or canvases, and the new "Bases" feature.
Where Obsidian fell short for me
In practice, I found self-managed syncing frustrating.
- I tried Synology Drive with my Obsidian vault stored on my NAS. While it worked, it wasn’t smooth: I had to install the app on every device, deal with sync jobs getting killed on my phone, or keep an annoying permanent notification running just to maintain sync.
- Exporting to pdf only works per document and only using the obsidian client, it does not have a cli api or something like that, only community driven plugins that also only work in-app and only as long as they are supported
- Converting the raw markdown files to pdf works for basic stuff but all obsidian-specific markdown tags will not work
- Obsidian has no api, it is closed source. There are plugins that add a REST api or a socket to send commands but those are all non-official plugins that have been developed and then abandoned.
The whole setup felt clunky and distracted from actually using the tool. I might look into setting up livesync/synthing in the future.
Where Outline works better for me
With Outline, the some things are more seamless:
- Self-hosted on my VPS → I can access it from anywhere, just using a browser.
- Cross-device → Switching between laptop and phone feels natural.
- Collaboration → Real-time editing and live collaboration built in.
- Open Source + API → I can write a script that interacts with the instance
Export to PDF
Exporting a document to PDF is only available per document for the business / enterprise editions of outline, the community version gives you JSON, HTML and (outline flavoured) Markdown.
Under the hood the PDF export uses gotenberg which (I guess) POSTs the html there to convert html to pdf. They offer a demo instance which I tried out to convert an example document.
curl \
--request POST https://demo.gotenberg.dev/forms/chromium/convert/html \
--header 'Content-Type: multipart/form-data' \
--form files=@index.html \
--form files=@attachments/example.png \
-o my.pdf
You are able to send the html (it has to be named index.html) and all attachments like pictures - they are not allowed to be in a subfolder though, so I would need to modify the html first
The plan
Here is what I want to improve:
- My exporter should be able to export whole collections or documents with all subdocuments, not just one single document
- It should include all inline pictures and maybe also attached pdf files
- Links to other documents should work (scroll to page in pdf)
- Add Page Numbers
Exporting the collection
This is can be done via the outline api
- trigger
colleciton.export
with format html - poll the file operation via
fileOperations.info
- get the file url using
fileOperations.redirect
- clean up the file using
fileOperations.delete
This gives us a zip file with all the html files and attachments
So exporting the whole collection - check ✅
Include the inline images - check ✅
Get document infos
The documents can be sorted in outline, to get this structure I request the object via api using collections.documents
and add them to a map to find documents by their path relative to the collection root folder:
async function getDocumentsList(collection_id) {
const documentStructure = await apiRequest('collections.documents', {
id: collection_id
});
let documentsByPath = new Map();
let order = 0;
const addToMap = (item, parentPath) => {
const itemPath = parentPath + "/" + item.title;
item.order = order++;
documentsByPath.set(itemPath, item);
if (item.children && item.children.length) {
item.children.forEach(child => {
addToMap(child, itemPath)
});
}
};
documentStructure.data.forEach(item => {
addToMap(item, "")
});
return documentsByPath;
}
Converting to PDF
For this I use puppeteer which means the chromium browser in the background does all the heavy lifting for me
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (const f in files) {
const file = files[f];
const pdfPath = path.join(tmp_pdf, "document_" + file.document.order + ".pdf");
const pdfFolder = path.dirname(pdfPath);
await mkdirp(pdfFolder);
file.pdf_path = pdfPath;
const absoluteHtmlPath = "file://" + file.path;
await page.goto(absoluteHtmlPath, { waitUntil: "networkidle0" });
await page.pdf({
path: file.pdf_path,
format: "A4",
printBackground: true,
tagged: true,
displayHeaderFooter: true,
});
}
await browser.close();
After that I merge the pages using pdf-lib
const pdfDoc = await PDFDocument.create();
pdfDoc.setProducer("pax by CodingKiwi");
pdfDoc.setCreationDate(new Date());
for (const f in files) {
const file = files[f];
const fileData = await fs.readFile(file.pdf_path);
const fileDoc = await PDFDocument.load(fileData);
const indices = fileDoc.getPageIndices();
const copiedPages = await pdfDoc.copyPages(fileDoc, indices);
copiedPages.forEach((page) => {
pdfDoc.addPage(page)
});
}
Getting Links working
I want the links to work, currently linking to another document results in a relative link like this:
<a href="./otherfile.html">Other File</a>
<a href="./../otherfile.html">Other File outside the current folder</a>
The problem with this approach is that each html page is rendered seperately. Links to other files are "broken" because the other file is not part of the pdf, thus puppeteer renders the link as a not-clickable text.
My first idea was merging all html documents into one giant html file, this felt a bit clunky because each html file has its own css style definitions and some html files are full width and some are not (this depends on tables inside outline for example).
I also would need to move all attachments into the same folder as the merged html for the images to work since the html is no longer at the same location relative to the image sources.
After fiddling around for a bit I ended up with a different approach:
First, we I modify the links so they survive the pdf phase:
for (const f in files) {
const file = files[f];
...
//replace links
const links = await page.$$("a[href]");
for (const link of links) {
const href = await (await link.getProperty("href")).jsonValue();
if (!href.startsWith(docLinkBase)) continue;
const filePath = href.replace("file://", "");
const file = files.find(f => f.path === filePath);
if (!file) {
logger.debug("Could not find file for %s", href);
continue;
}
await link.evaluate((el, newHref) => {
el.setAttribute("href", newHref);
}, "http://replace-me.com/#" + file.document.id);
}
await page.pdf(...);
}
I also add page numbers:
let pageCounter = 0;
for (const f in files) {
const file = files[f];
...
await page.pdf(...);
file.page = pageCounter;
file.page_count = await getPageCount(file.pdf_path);
pageCounter += file.page_count;
}
This way I can modify the links in the final pdf using dark arcane pdf magic.
Get the id from the link -> get the file from the id -> get the page number from the file -> modify the link from a URI link to a GoTo Link
const pages = pdfDoc.getPages();
pages.forEach(page => {
page.node.Annots()?.asArray().forEach((a) => {
const dict = pdfDoc.context.lookupMaybe(a, PDFDict);
const aRecord = dict.get(asPDFName(`A`));
const link = pdfDoc.context.lookupMaybe(aRecord, PDFDict);
const uri = link.get(asPDFName("URI")).toString().slice(1, -1); // get the original link, remove parenthesis
if (uri.startsWith("http://replace-me.com/#")) {
let id = uri.replace("http://replace-me.com/#", "");
let pageNr = files.find(f => f.document.id === id).page;
link.set(asPDFName('S'), asPDFName('GoTo'));
const targetPageRef = pdfDoc.getPage(pageNr).ref;
const ctx = PDFArray.withContext(pdfDoc.context);
ctx.push(targetPageRef);
ctx.push(asPDFName('Fit'));
link.set(asPDFName('D'), ctx);
}
});
})
Links to other documents - check ✅
Adding page numbers
For this I add a little bit of margin on the bottom and add the pageNr to the loop we just created
await page.pdf({
displayHeaderFooter: false,
margin: {
bottom: 50
}
});
...
const { width } = page.getSize();
page.drawText(pageNr + " / " + pages.length, {
x: width - 50,
y: 20,
size: 8
})
Page Numbers - check ✅