Compressing an obsidian vault into a single PDF

CodingKiwi

13 Sep 2025 • 6 min read

As mentioned in my other post Obsidian is a great tool for writing documentation.

I want to set aside the cross-device syncing issues for now, since those can likely be solved by either paying for Obsidian Sync or setting up LiveSync (which I haven’t done yet).

For now, how can I create a single PDF from my Obsidian vault?

Cleaning the markdown

Since Obsidian doesn’t have a real API, and I don’t want to rely on a plugin that could break whenever Obsidian updates, I prefer to work directly with the raw Markdown files.

This approach comes with a small compromise: it assumes your Markdown doesn’t rely on too many non-standard features. If you’ve heavily customized Obsidian with dozens of “productivity” plugins, this method might not work for you.

First I pass the obsidian vault through obsidian-export, a rust tool that exports the vault to regular markdown. This resolves [[note]] wikilink links to normal links.

Fixing Links

Since the markdown files are in a possibly nested folder structure and I want to merge all pages into one document the relative links need to be fixed.

My rule for a page link is that a document at /my-vault/example/subfolder/test.md has the unique relative filepath example/subfolder/test.md which is sluggified to example-subfolder-test

So for each link in the markdown:

get the referenced filepath and calculate its relative path from the vault root instead
using this relative path we can calculate the unique anchor reference
put anchor reference as link target

export function localizeFileLinks(content, filePath) {
    const fileFolder = path.dirname(filePath);

    return content.replace(/\[([^\]]+)\]\(([^)]+)\)/g, (match, text, link) => {
        if (link.endsWith(".md") || link.includes(".md#")) {
            const [targetFile, hash] = link.split("#");

            const absTarget = path.resolve(fileFolder, targetFile);
            const anchor = filenameToAnchor(absTarget);

            if (anchor) {
                return `[${text}](#${anchor}${hash ? "-" + hash : ""})`;
            } else {
                console.log(absTarget + " not found");
            }
        }
        return match;
    });
}

For images we simply get their path relative to the vault root and leave them where they are. This way if I put the merged markdown at the vault root, the image links in the markdown point to the correct location.

export function localizeImageLinks(content, filePath, rootDir) {
    const fileFolder = path.dirname(filePath);

    return content.replace(/!\[([^\]]*)\]\(([^)]+)\)/g, (match, alt, imgPath) => {
        const absImg = path.resolve(fileFolder, imgPath);
        const relToRoot = path.relative(rootDir, absImg);
        return `![${alt}](${relToRoot})`;
    });
}

Fixing Headlines

Since nothing prevents you from adding Level-1 headers in obsidian I decided to simply normalize the headings so that there are only level-2 ones

export function normalizeHeadings(markdown) {
    const hasH1 = /^# .+/m.test(markdown);
    if (!hasH1) return markdown;

    return markdown.replace(/^(#+)(\s+)/gm, (match, hashes, space) => {
        // add one extra '#'
        return hashes + "#" + space;
    });
}

This allows me to put the file name as the Level-1 Headline with the unique anchor I mentioned above

export async function processMarkdownFile(file, rootDir) {
    ...
    
    const headline = path.basename(file).replace(/\.md$/, "");
    const anchor = filenameToAnchor(file);

    let markdown = "";

    //first the anchor to jump to this .md file
    markdown += `<a class="header-anchor" id="${anchor}"></a>\n\n`;

    //the main h1 headline
    markdown += "# " + headline + "\n\n";

    //now the body
    markdown += parsed.body;

    //force page break after
    markdown += "<div style='page-break-after:always'>&nbsp;</div>\n\n";

    return markdown;
}

Fixing Anchor Links

The above setup works for links to other markdown files but it does not work for links to headlines inside markdown files because if you take a look at the code

[Other Headline](../other-file.md#other-headline)

currently gets converted to

[Other Headline](#subfolder-other-file-other-headline)

Which is a headline that does not exist, we only create anchors for the Level-1 headings.

Also anchor links in a markdown file that reference a heading inside the same markdown file like

[Headline in same file](#headline)

Are also not good because a) nothing guarantees that "headline" is a headline that does not exist anywhere else in the vault and b) Non-Level-1 headlines do not even have ids yet

I fix this in the conversion step from markdown to html with a plugin:

import markdownItAnchor from "markdown-it-anchor";

md.use(markdownItAnchor, {
    level: 2
});

This adds anchors but it creates non-unique ones (## heading gets the anchor id heading)

After some tinkering I ended up with this complicated looking custom markdown-it plugin:

md.use((md) => {
        md.core.ruler.push('heading_parent', function (state) {
            let h1Stack = [];

            state.tokens.forEach((token, index) => {
                if (token.type === 'inline') {

                    const match = token.content.match(
                        /<a\s+class="header-anchor"\s+id="([^"]+)"><\/a>/
                    );

                    if (match) {
                        const lastAnchorId = match[1];
                        h1Stack.push(lastAnchorId);
                    }
                }

                if (token.type === 'heading_open') {
                    const level = parseInt(token.tag[1]);

                    if (level > 1 && h1Stack.length > 0) {
                        // Associate with last <h1> token
                        const parentH1 = h1Stack[h1Stack.length - 1];

                        let title = state.tokens[index + 1].children
                            .filter(t => ['text', 'code_inline'].includes(t.type))
                            .map(t => t.content)
                            .join('');

                        state.tokens[index].attrSet("id", parentH1 + "-" + slugify(title))
                    }
                }
            });
        });
    });

Let's explain what it does:

loop over the markdown document tokens
for every token that is one of the Level-1 anchors we created, push it's id into an array
for every token that is a non-level-1 heading take the last entry of the level-1 anchor ids (basically the "parent" level 1 header) plus the current heading text and create a unique slug for it

This way all headings have unique anchors and the fixed links work!

Adding a PDF Outline

PDFs can have an “outline” that appears as a navigation menu in most PDF viewers. To create this, I need the page number for each headline - but that’s tricky, because I can’t know in advance how much space a Markdown document will take, and therefore how many pages it will occupy.

I decided on parsing the final pdf after it has been generated. So first generate the pdf:

export async function htmlToPdf(htmlPath, pdfPdf) {
    const browser = await puppeteer.launch({
        headless: true,
        defaultViewport: null,
        executablePath: '/usr/bin/google-chrome',
        args: ['--no-sandbox'],
    });
    const page = await browser.newPage();

    await page.goto("file://" + htmlPath, { waitUntil: "networkidle0" });

    // Export to PDF
    await page.pdf({
        path: pdfPdf,
        format: "A4",
        printBackground: true,
        displayHeaderFooter: false,
        margin: { top: "20mm", bottom: "20mm", left: "20mm", right: "20mm" },
    });

    await browser.close();
}

Now get the headlines:

export async function getHeadlines(filePath) {
    const loadingTask = pdfjsLib.getDocument(filePath);
    const pdf = await loadingTask.promise;

    let parsed = [];
    let biggestFontSize = 0;

    for (let i = 1; i <= pdf.numPages; i++) {
        const page = await pdf.getPage(i);
        const content = await page.getTextContent();

        const texts = content.items.map(item => {
            const size = item.transform[0];

            if (size > biggestFontSize) biggestFontSize = size;

            return {
                text: item.str,
                font: item.fontName,
                size: size
            }
        });

        parsed.push({
            pageNr: i,
            texts
        });
    }

    let chapters = [];

    parsed.forEach(page => {
        let headlineText = page.texts.filter(text => text.size === biggestFontSize).map(text => text.text);

        if (headlineText.length) {
            chapters.push({
                pageNr: page.pageNr,
                headline: headlineText.join("")
            })
        }
    });

    return chapters;
}

As you can see, this approach is a bit unconventional. When parsing a PDF, there’s no metadata indicating whether a piece of text is an <h1> headline—you only get positions and dimensions. So, I make the assumption that a piece of text is an <h1> if it uses the largest font size in the document. I’m not entirely happy with this method, but for now, I haven’t come up with a better solution.

Once I have this array, I can use it to add the outline:

import { outlinePdfCjs } from "@lillallol/outline-pdf-cjs";

export async function addOutline(filePath) {
    const chapters = await getHeadlines(filePath);

    let outline = chapters.map(chapter => {
        return chapter.pageNr + "||" + chapter.headline
    }).join("\n");

    let tmpFile = "/tmp/outlined-pdf.pdf";

    await outlinePdfCjs({
        loadPath: filePath,
        savePath: tmpFile,
        outline
    });

    await fs.unlink(filePath)
    await fs.rename(tmpFile, filePath)
}

Check out the repo for the full sourcecode