Anthony McLin

Statically generate a markdown-based blog using NextJS

I've begun the process of moving my site from Drupal to a custom-built React application stack using NextJS. As part of this, I'm moving content to markdown files kept with the source code, rather than maintaining a CMS for simple content storage. Let me take you through the process of how you can use NextJS to efficiently render markdown files as static HTML for fast performance, quick build times, and maximum SEO.

Your first instinct might be to use NextJS's dynamic routing feature. In principle, a route can look like https://example.com/blog/[articleSlug] where [articleSlug] is a dynamic value allowing any pages to be created without making individual javascript files for each one. We do this by creating the file /src/pages/blog/[articleSlug].js in our project, and NextJS will automatically provide articleSlug as a variable to the javascript file. From there, we could conceivably use the fs.readFileSync() method from NodeJS to load /content/${articleSlug}.md from the file system and render out the contents using the Markdown processor of your choice.

Let's see what that looks like:

// contents of /src/pages/blog/[slug].jsx

import { useRouter } from 'next/router'
import ReactMarkdown from 'react-markdown' // Our chosen Markdown processor
import { existsSync, readFileSync } from 'fs'
import React from 'react'

const { error, debug } = console // Prevents eslint errors when using console.log
const contentPath = '../../../content'

/**
* Maps a slug to the content path on the filesystem
* @param {string} slug URL segment for the unique article
* @returns {string} The relative path to the file
*/
const formatPath = (slug) => {
return `${contentPath}/${slug}.md`
}

/**
* Checks if a specified article exists on the filesystem as a markdown file
* @param {string} slug URL segment for the unique article
* @returns {boolean} Whether the file exisgts
*/
const articleDoesExist = (slug) => {
try {
if (existsSync(formatPath(slug))) {
  return true
}
} catch (err) {
error(`${slug}.md does not exist in the content directory.`)
debug(err)
}
return false
}

const Article = () => {
const router = useRouter()
const { slug } = router.query

if (articleDoesExist(slug)) {
const contents = readFileSync(formatPath(slug))
return (
  <article>
    <ReactMarkdown source={contents} />
  </article>
)
}

// Let error handling generate a 404
throw new Error('Article does not exist')
}

export default Article

If the file doesn't exist, we should get an error. The markdown files are dynamically linked by filename, so we could have as many as we want with no change to the code. We could easily extend this in the future with more functionality without breaking existing articles. We have a nice place to hook in redirects or other types of behaviors. Overall this looks like a fairly clean and robust solution that should scale nicely.

Let's fire up the local NextJS dev environment and see what happens when we try to visit https://localhost:3000/blog/example-blog-post:

What went wrong here? Well the browser console gives us the clue:

./src/pages/blog/[slug].jsx
Module not found: Can't resolve 'fs' in '/Users/amclin/Documents/git/anthonymclin.com/src/pages/blog'

We can't use fs.fileExistsSync or for that matter any of the fs methods because our React code is running in the browser. fs is a NodeJS API, and therefore is only available in server-side code. Browsers cannot call it. NextJS by default leverages a hybrid model where code runs the same on both the client and server, allowing for a lot of optimizations and simplification of client/server code communications. So how can we solve this and dynamically build our blog articles?

Approaches

Content API

NextJS has a built-in feature to make APIs very quickly by placing javascript files directly in the /src/pages/api folder. So we could make a quick API that exposes the content. Like page page routes, the API routes in NextJS also support dynamic segments, so our API for retrieving an article could be at `https://example.com/api/blog/article/[slug]

Let's take a look at what that could look like:

import { existsSync, readFileSync } from 'fs'

const { error, debug } = console
const contentPath = 'content/drafts'

/**
* Maps a slug to the content path on the filesystem
* @param {string} slug URL segment for the unique article
* @returns {string} The relative path to the file
*/
const formatPath = (slug) => {
return `${contentPath}/${slug}.md`
}

/**
* Checks if a specified article exists on the filesystem as a markdown file
* @param {string} slug URL segment for the unique article
* @returns {boolean} Whether the file exisgts
*/
const articleDoesExist = (slug) => {
try {
if (existsSync(formatPath(slug))) {
  return true
}
} catch (err) {
error(`${slug}.md does not exist in the content directory.`)
debug(err)
}
return false
}

const apiHandler = (req, res) => {
// Protect the API from other request types
if (req.method !== 'GET') {
res
  .status(400)
  .json({ message: `request type ${req.method} not supported` })
return
}
// 404 when article doesn't exist
if (!articleDoesExist(req.query.slug)) {
const message = `Article ${req.query.slug} does not exist`
error(`${message} in ${contentPath}`) // Server side error log
res.status(404).json({ message }) // Error returned to client
return
}

const content = readFileSync(formatPath(req.query.slug), 'utf-8')
res.status(200).json({ content })
}

export default apiHandler

This small API is doing a few things. First, requests are validated to ensure that only HTTP GET request types are used. This help protects against DDOS attacks and makes it easy to implement caching later. Any other request type gets an immediate 400 error response. Next the API checks to see if a matching article exists, using the same approach that we tried originally in the dynamic pages attempt. If no matching article exists, then we return an HTTP 404 error code. (Notice that just like before, we're only returning the article slug in the error message, and not the file system path. Exposing the full filesystem path to the content files could be a security risk)

If we try to make a request to the API endpoint with the article slug at http://localhost:3000/api/blog/article/example-blog-post we should get a JSON result containing the markdown:

This gets us the contents of the file delivered to the browser, but we still need a page to host it. Let's go back to the /src/pages/blog/[slug].jsx file and replace the contents:

import { useRouter } from 'next/router'
import useSWR from 'swr'
import ReactMarkdown from 'react-markdown'
import React from 'react'

const apiPath = '/api/blog/article'

const { error, debug } = console

/**
* Async handler for AJAX requests
* @param {sting} url to fetch
*/
const fetcher = async (url) => {
debug(`Loading blog article at ${url}`)
const res = await fetch(url)
const data = await res.json()

if (res.status !== 200) {
throw new Error(data.message)
}
return data
}

const Article = () => {
const { query } = useRouter()
const { data, error } = useSWR(
() => query.slug && `${apiPath}/${query.slug}`,
fetcher
)

if (error) return <div>{error.message}</div>
if (!data) return <div>Loading...</div>

return (
<article>
  <ReactMarkdown source={data.content} />
</article>
)
}

export default Article

This modified page file is now displaying a "loading" message when the page loads, passes the article slug to the API we built, and when the API responds, redraws the page to render the contents. The magic of the redraw happens using React Hooks and the useSWR library, but we could have done that ourselves using a normal fetch async callback to push something into state or props for a React component.

On the surface, this solution seems solid. It splits the app into a traditional two-layer architecture, and by abstracting the content behind an API, we make it possible to move content into a database or other source without recompiling the front-end code that runs in the browser.

However there are a couple of fatal flaws here.

  1. Our app now requires server-side javascript. This means running a NodeJS server, which is a lot more maintenance and cost than simple static file serving like AWS S3 provides.
  2. We've introduced Search Engine Optimization (SEO). If you were to load the page with Javascript disabled, the "loading" message would never get replaced with the actual article because the raw HTML delivered to the browser doesn't include the article contents. Yes, many search engines now execute Javascript when they index a page, but if the Javascript execution errors for some reason, the contents will never be reached.
  3. We've introduced performance bottlenecks. Waiting for the DOM to render and the Javascript to make extra requests means it takes longer before the content is displayed. Since search engines now factor page speed into your ranking, that's doubly problematic.
  4. The supposed benefit here is that we can change out the API without rebuilding and redploying our browser code. But NextJS treats the browser code and server-side code as one app. If we rebuild and redeploy one, we're rebuilding and redeploying both. We would be able to deploy new article markdown files without the need to deploy the rest of the app, but that's the only piece thats truly decoupled.

These caveats mean this isn't the appropriate solution unless we were migrating our site towards completely dynamic content. If our content is largely static, there's not much sense in dynamically loading it every time. We lose all the optimizations that HTML can give us out of the box.

Dynamically generate static content pages

The best feature of NextJS as a React framework is that it is simultaneously a Static Site Generator (like Gatsby), a Server-Side-Rendering framework, and a hybrid framework. It's easy to switch between several different server to client delivery patterns with very minor code changes. Since our blog content is going to be static, it makes a lot of sense to generate it only once at build time, so we have preoptimized HTML pages that can be displayed in the browser or indexed by search engines before the page loads. We can take advantage of super-cheap static web hosting like S3, and worldwide read-only caching using a CDN like CloudFront. We can even get hosting for free from Netlify if we want to make the site source public on GitHub.

Any javascript files within the /src/pages folder will automatically be generated as HTML files when we run the NextJS next build && next export commands. But since we have a dynamic route to the blog pages using /src/pages/blog/[slug].jsx, we will only see one HTML file generated for the "loading" screen, and no HTML files generated for the individual blog articles. This is where we can leverage the NextJS function getStaticPaths(). We need to build a list of paths for NextJS to automatically create HTML files for export.

Let's go back to the /src/pages/blog/[slug].jsx and add two new functions:

/**
* Loads the data to be used as props at build time
* @param {object} context NextJS Context object
*/
const getStaticProps = async ({ params }) => {
const articlePath = formatPath(params.slug)
debug(`loading ${articlePath}`)
const content = readFileSync(articlePath, 'utf-8')

return {
props: {
  content
}
}
}

/**
* Generates the list of article slugs and creates router paths for
* each article at build time
* https://nextjs.org/docs/basic-features/data-fetching#getstaticpaths-static-generation
*/
const getStaticPaths = async () => {
return {
paths: getArticlesList().map((article) => {
  return {
    params: {
      slug: article
    }
  }
}),
fallback: false // Unmatched slugs should 404
}
}

export { Article as default, getStaticProps, getStaticPaths }

And replace the existing exports with this:

export { Article as default, getStaticProps, getStaticPaths }

When a page exposes an asyncronous getStaticProps() method, NextJS will use that to pre-render that page at build time. The method will only get called at build time, and never from the browser. This is where we move the file loading logic since the API we built previously is not going to be available at build time.

The getStaticPaths() async method is also special. Since our page uses the dynamic variable slug in its path, we need to tell NextJS what paths we want generated at during static site generation. When NextJS encounters the getStaticPaths() during the static site generation lifecycle, it will run it to get a list of paths it needs to add to the router and build-time processing. In turn, the slug for each those paths will be passed to the getStaticProps() methd when each path is built. This function will only get called during site generation, and will never run in the browser. Therefore, the logic in this method is to scan the content directory for a list of markdown files, and use that to build a list of article paths.

I've abstracted a few things for clarity. The formatPath() method can simply be copied from the API in /src/pages/api/blog/article/[slug].js. Here is the directory scanning function getArticleList(). Both should be added to /src/pages/blog/[slug].jsx but neither need to be exposed in the exports:

/**
* Read the articles directory and return a list of slugs that exist
* @returns {array} list of slugs matching article markdown files
*/
const getArticlesList = () => {
const articles = readdirSync(contentPath, { withFileTypes: true })
.filter((f) => f.isFile()) // Only return files
.filter((f) => f.name.indexOf('.md') === f.name.length - 3) // Only return files ending in '.md'

if (articles.length <= 0) {
return error(`No articles in content directory ${contentPath}`)
}

// trim the .md off the end of the filename to get the slug
return articles.map((article) => article.name.split('.md')[0])
}

If you were to run next build && next export at this point, you would see that we get multiple HTML files generated in /out/blog/, one corresponding to each markdown file that exists in the /content/ folder. However, if you loaded them in the browser without Javascript, or looked at the source of the file, you would see that it contains just the loading message, and no actual article contents. This is because our Article() method is still written to call the API we built, and that API isn't available during the static generation lifecycle. We need to rewrite this function to accept the content that is being provided from the getStaticProps() loader that we wrote.

/**
* Renders a loading indicator and replaces it with a blog article
* after dynamically requesting the article contents from the API
*/
const Article = ({ content }) => {
return (
<article>
  <ReactMarkdown source={content} />
</article>
)
}

Since getStaticProps() does the heavy lifting of provide a context object containing suitable for consumption by the React component, we only need to accept that prop and pass it on to the markdown processor. Now that Article() no longer is calling the API, we can remove the fetcher() helper method and React hooks from this page. If we want, we could also delete the API itself since we're no longer using it.

Now if you run next build && next export you'll find that the /out/blog/ directory contains HTML files for each content article, and if you review the source of those files, you'll see that the HTML contains the content of the articles. This means the content is there at initial page load before any Javascript runs, which is great for page performance, and also ensures the search engines can index the content properly for best SEO. At this point we have a 100% statically generated site. If we add a new article markdown file, we do need to regenerate the site with next build && next export, but no code needs to be updated. The articles will get scanned automatically from the /content directory at build time.

If during export you see a warning Attention: Statically exporting a Next.js application via `next export` disables API routes. it's because you still have the API defined. Statically-generated sites don't have APIs within them, because an API is for server-side code execution. The API is dead code at this point, so it's safe to remove to make this warning go away.

If you'd like to see all of this working in action, this pull request on Github shows how I implemented all these pieces together to create this blog.

Add new comment