How to save a web page as PNG or PDF with Nightmare?

Introduction

In this example we’ll see how to utilize Nightmare, Xvfb and how to properly bootstrap the main application code in order to make a cross-platform HTML → PDF script.

Let’s start by initializing our application packages and preparing the application index.js. While we are in our new application folder, let’s create a default package.json and install the packages:

npm init -y
npm i --save nightmare @cypress/xvfb
touch index.mjs

For the simplicity of top level async functions, we will use index.mjs as the main JavaScript file. This way the file will be treated as a module, which enables top level await and import/export specifiers.

We are ready with the packages and we can now start bootstrapping our application. Let’s start with setting the Xvfb:

import process from 'process'; 
import util from 'util'; 

import Xvfb from '@cypress/xvfb';
import Nightmare from 'nightmare'; 

const has_display = (['win32', 'darwin'].indexOf(process.platform) !== -1);

if (!has_display) {
  var xvfb = new Xvfb(options);
  xvfb.startPromise = util.promisify(xvfb.start);
  xvfb.stopPromise = util.promisify(xvfb.stop);
  await xvfb.startPromise();
} else {
  // mock xvfb object
  var xvfb = {
    stopPromise: async function() {}
  }
}

What we are doing here is making sure that Xvfb session is initialized in case we are on an operating system without the display other than Windows and Mac OS. This is a more less safe approach and should cover most cases. In case it’s Windows or Mac, we mock the xvfb object to have an empty stopPromise method, so we could safely call it later without additional conditions. We will also use the show flag later to control whether Nightmare should show the Electron window or not.

Now, let’s initialize our nightmare object and go to wiki’s page for Gallium (which is a codename for Node.js 16).

const n = Nightmare({ 
  show: false,
  maxHeight: 8000
});
await n.viewport(1024, 800);
await n.goto('https://en.wikipedia.org/wiki/Gallium');

What these 3 lines of code do is: initialize the Nightmare browser, resize the window and load the URL. We will get to the maxHeight parameter later.

You will notice that Nightmare actions are being awaited, that is because every Nightmare call is chainable and returns a Promise. You could also chain every call and await it only once, depending on your needs, i.e.:

await n.viewport(1024, 800)
       .goto('https://en.wikipedia.org/wiki/Gallium')
       .click('.that_button');

Generate PDF

As an example, let’s remove the Wiki logo and replace the title:

await n.evaluate(function() {
  // hide wiki logo
  var sheet = window.document.styleSheets[0];
  sheet.insertRule('#firstHeading:before { content: none }', sheet.cssRules.length);

  // change title
  document.getElementById('firstHeading').innerText = 'Testing automated PDF generation with Node.js and Nightmare';
});

Now, let’s just save the PDF.

// pdf
await n.pdf('generated.pdf', {
  printBackground: true
});

Generate PNG

Easy as pie. Now, let’s see what has to be done for PNG. It looks like it’s as easy as n.screenshot('generated.png') with resizing the viewport to our document length:

// png
// detect the viewport height
var height = await n.evaluate(function() {
  return document.body.scrollHeight;
});
await n.viewport(1024, height);
await n.screenshot('generated.png');

Until the page length grows to thousands of pixels.. You would be surprised, but it doesn’t work with really long pages. Remember how we set maxHeight: 8000? This is all due to this bug in Chromium, which has a limit of 16384 (some people reported 8192 pixels depending on the system it runs on). So, to overcome this, we will need to make several screenshots and glue them together. And suddenly our code grew into this:

const height = await n.evaluate(function() {
  return document.body.scrollHeight;
});
const pageheight = 4096;
const pages = Math.ceil(height / pageheight);
for (var a = 1; a <= pages; a++) {
  // where to scroll
  var offset = (a-1) * pageheight;
  // what size of the window to set
  var remainder = Math.min(pageheight, height - offset);
  await n.viewport(1024, remainder);
  // actually scroll the document
  await n.evaluate(function(offset) {
    document.documentElement.scrollTop = offset;
  }, offset);
  // screenshot!
  await n.screenshot(`generated.${a}.png`);
}

Issues with PNG generation

This looks like it should work, but again it doesn’t :D. Two things are missing:

1. On high-dpi systems the screenshots have different dimensions from the ones we set in viewport, this can be fixed by setting the dpi flag for Nightmare object:

const n = Nightmare({   
  switches: { 'force-device-scale-factor': '1' },

2. After fixing the issue #1, the screenshots are still a bit off from what we set via n.viewport. I found that on Windows, the exact difference is 67px all the time. Thus, let’s adjust our viewport call a bit (you might need to adjust it for your system):

const viewportMagicNumber = 67; // on Windows, screenshots are cut by 67 pixels
await n.viewport(1024, remainder + viewportMagicNumber);

With these 2 issues above fixed, we are on to gluing our screenshots into one. For this task, we’ll use a the Sharp package. Back to our command line:

npm i --save sharp

What we need essentially is to create an empty image with the size of our document, and then draw the paged images into specific places. Import the sharp constructor first:

import sharp from 'sharp';

Now back to our PNG generation, create an empty image first:


var image = await sharp({
  create: {
    width: 1024,
    height: height,
    channels: 4,
    background: { r: 255, g: 255, b: 255, alpha: 0 }
  }
});

Now a bit of JavaScript magic to iterate through every page and construct a composite option:


await image.composite(Array.from(Array(pages).keys()).map(function(value, index) {
  return {
    input: `generated.${index+1}.png`,
    left: 0,
    top: pageheight * index
  }
})).toFile('generated.png');

This essentially just makes an array of objects with input, left and top. The end result is saved into generated.png and we of course await it because it’s an asynchronous operation.

Things to improve

As always, there are a few things to improve. Here is what should be done to make it nice and complete:

  1. The PNGs have the visible scrollbar – this can be fixed by including the CSS to hide the scroll altogether.
  2. The width of the pages is 1006px instead of 1024px. This is probably due to the scrollbar, but needs some investigation.
  3. Remove generated.$page.png after the generation is done.
  4. Error handling.
  5. Functionalize crucial parts of the script.
  6. Export the main module functionality as Promises to be used in other applications.

That’s it! Here’s the full script once again:

import process from 'process'; 
import util from 'util'; 

import Xvfb from '@cypress/xvfb';
import Nightmare from 'nightmare'; 
import sharp from 'sharp';

const has_display = (['win32', 'darwin'].indexOf(process.platform) !== -1);

if (!has_display) {
  var xvfb = new Xvfb(options);
  xvfb.startPromise = util.promisify(xvfb.start);
  xvfb.stopPromise = util.promisify(xvfb.stop);
  await xvfb.startPromise();
} else {
  // mock xvfb object
  var xvfb = {
    stopPromise: async function() {}
  }
}

const n = Nightmare({ 
  switches: { 'force-device-scale-factor': '1' },
  show: false,
  maxHeight: 8000
});
await n.viewport(1024, 800);
await n.goto('https://en.wikipedia.org/wiki/Gallium');
await n.evaluate(function() {
  // hide wiki logo
  var sheet = window.document.styleSheets[0];
  sheet.insertRule('#firstHeading:before { content: none }', sheet.cssRules.length);

  // change title
  document.getElementById('firstHeading').innerText = 'Testing automated PDF generation with Node.js and Nightmare';
});

// pdf
await n.pdf('generated.pdf', {
  printBackground: true
});

// png
// detect the viewport height
const height = await n.evaluate(function() {
  return document.body.scrollHeight;
});

const pageheight = 4096;
const pages = Math.ceil(height / pageheight);
const viewportMagicNumber = 67; // on Windows, screenshots are cut by 67 pixels
for (var a = 1; a <= pages; a++) {
  // where to scroll
  var offset = (a-1) * pageheight;
  // what size of the window to set
  var remainder = Math.min(pageheight, height - offset);
  console.log(`Resizing to 1024x${remainder}, scroll to ${offset}`);
  await n.viewport(1024, remainder + viewportMagicNumber);
  // wait for window resize to redraw
  await n.wait(1000);
  // actually scroll the document
  await n.evaluate(function(offset) {
    document.documentElement.scrollTop = offset;
  }, offset);
  // screenshot!
  await n.screenshot(`generated.${a}.png`);
}

var image = await sharp({
  create: {
    width: 1024,
    height: height,
    channels: 4,
    background: { r: 255, g: 255, b: 255, alpha: 0 }
  }
});

await image.composite(Array.from(Array(pages).keys()).map(function(value, index) {
  return {
    input: `generated.${index+1}.png`,
    left: 0,
    top: pageheight * index
  }
})).toFile('generated.png');

// don't forget the .end()
await n.end();

// destroy any xvfb instances
await xvfb.stopPromise();

Leave a Comment

Your email address will not be published.