Raphael Stäbler

Raphael

profile-pic
Web Development

PDF Generation for Node.js using Puppeteer

PDF generation has always been somewhat of a hassle throughout my career as a software developer. Most PDF libraries and implementations still feel very low-level and you basically always end up creating your own rendering engine.

Build your own rendering engine?

As an example, let’s have a quick look at a piece of code taken from PDFKit’s documentation:

doc.fillColor('green')
    .text(lorem.slice(0, 500), {
      width: 465,
      continued: true
    })
    .fillColor('red')
    .text(lorem.slice(500))

The above code lets you take a text stored in a variable called lorem and render the first 500 characters in green and everything after that in red. Doesn’t look too complicated, though, does it? No, it doesn’t. Although, in a real-world setting you would probably rather want to highlight a few specific words within a block of text. In order to achieve that, you would need to determine the corresponding ranges of affected characters within that text and dynamically build something like the above code. Now imagine you also want to create a column layout or maybe create a table and insert some images. Things will get complicated pretty fast and you may indeed find yourself building a PDF rendering engine.

Use an existing rendering engine

As it happens, we do have highly sophisticated rendering engines for complex layouts and they’re called browsers. Firing up a web browser to render a PDF may seem to create a lot of overhead, though, and that’s true. If your goal is to create PDFs as efficiently as possible it certainly wouldn’t be advisable to do it using a browser. In any other case, however, the benefits might outweigh the performance concerns by a lot.

Enter Puppeteer

Puppeteer is basically an automated Chromium instance for Node.js. It can be used for many things like automated UI testing, automated form submission and web browsing as well as automated screenshot and PDF generation.

Generating a PDF with Puppeteer is pretty simple:

const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch()
    const page = await browser.newPage()
    await page.goto('https://www.medium.com')
    await page.pdf({path: 'medium.pdf', format: 'A4'})
    await browser.close()
})()

So, you basically launch a browser, open a page, print the page to a PDF file and close the browser.

Instead of a file you can also print your PDF to a buffer by omitting the path option:

const buffer = await page.pdf({format: 'A4'})

Creating a layout

Now, all that’s left to do is create a layout using HTML and CSS. In fact, you can use any web technology you like — even JavaScript, SVG or Canvas.

I like to maximize control over the layout of single pages, so I found it very helpful to create a page container:

.page {
    position: relative;
    overflow: hidden;
    padding: 0.8in;
    page-break-after: always;
}
.page.landscape {
    width: 11.7in;
    height: 8.2in;
}
.page.portrait {
    width: 8.3in;
    height: 11.6in;
}

You may want to use position: relative in order to be able to absolutely position some of your elements on the page.

You may also want to use page-break-after: always in order to force page breaks after every page.

You can then simply go about creating your PDF pages in HTML:

<body>
    <div id="page1" class="page landscape"></div>
    <div id="page2" class="page landscape"></div>
</body>

If you want to use a landscape layout like the example above you would also need to tell Puppeteer to print in landscape:

page.pdf({format: 'A4', landscape: true})

You may also want to enable printing of backgrounds in case you need to use background colors or images:

page.pdf({format: 'A4', landscape: true, printBackground: true})

In fact, using background images instead of img tags can give you more control while placing images inside the layout:

.page .title-image {
    width: 100%;
    height: 4in;
    background: url("…") no-repeat center center;
    background-size: contain;
}
<div class="page landscape">
    <div class="title-image"></div></div>

That way you can use CSS properties like background-position and background-size to optimally layout your images. This can be especially helpful when dealing with dynamic content where image dimensions may vary.

Thanks to absolute positioning, placing headers and footers is no hassle, either:

.page .footer {
    position: absolute;
    left: 0.8in;
    right: 0.8in;
    bottom: 0.2in;
    border-top: 1px solid #000;
    padding: 0.1in 0 0;
}
<div class="page landscape"><div class="footer"></div>
</div>

You get tables, borders, margins, paddings, positions and colors for free. Of course, rich text comes shipped, too, using CSS to vary font sizes and styles, or using HTML elements like strong or em.

You can even include your own font face using web fonts. Font face generation is very easy using tools like Transfonter.

Generating dynamic content

So far, you’ve got your renderer and your layout. Now you probably want to fill the latter with dynamic content. Otherwise, PDF generation wouldn’t make much sense in most cases.

Chances are you find yourself already inside a web application environment when developing your PDF generation. In that case you most likely already have the means to creating and serving dynamic HTML templates in place.

In case you don’t, you could use a combination of Express and a template engine like Pug or Mustache.

Your Node.js code will then probably look something like this:

const express = require('express')
const mustacheExpress = require('mustache-express')
const puppeteer = require('puppeteer')
const app = express()
app.engine('html', mustacheExpress())
app.set('view engine', 'html')
app.get('/export/html', (req, res) => {
    const templateData = {}
    res.render('template.html', templateData)
})
app.get('/export/pdf', (req, res) => {
    (async () => {
        const browser = await puppeteer.launch()
        const page = await browser.newPage()
        await page.goto('http://localhost:3000/export/html')
        const buffer = await page.pdf({format: 'A4',})
        res.type('application/pdf')
        res.send(buffer)
        browser.close()
    })()
})
app.listen(3000)

That’s it. You’re all set. A call to localhost:3000/export/pdf will fire up a headless Chromium browser and call localhost:3000/export/html and render its content to a PDF and then send it back to the user’s browser.

Of course, you might want to do other stuff with your PDF like saving it to disk or send it out by email. In that case you may not need to have a route for /export/pdf but the basic mechanics are still the same.

One last thing: I found it to be advisable to add another option to Puppeteer’s goto method:

page.goto('…', {waitUntil: 'networkidle0'})

The option waitUntil networkidle0 instructs Puppeteer to only consider a page fully loaded when there hasn’t been an open network connection for at least 500ms. Other options are available and may be more useful for different cases.

Conclusion

By using what is probably the most common rendering engine for all kinds of layouts, you can save yourself a lot of trouble when dealing with PDF generation. It’s not that CSS-based layouts are always painless and straight forward, but you probably already have some experience with it. On top of that, you don’t have to worry about cross-browser compatibility and the likes when creating layouts for one single browser version — your layout only has to work in one specific place. This gives you the freedom to easily create sophisticated and professional looking PDF layouts for your applications.