A Holding Company for Interesting Concepts
Creating a Markdown-Driven NextJS Static Blog Page
By Sapling Corp

One of the driving motivations for creating this section was to have a place where the group can keep track of the concepts I work on. Sapling is where people should enjoy coming up with and thinking about solutions that can be solved through innovative and novel methods. Sometimes, this project will work a proof-of-concept or start a project and eventually find that it is beyond the collective expertise or over the amount of time people should sink into it. For the record, we shouldn't think it's a bad thing to start something never to finish, as long as it's exclusively for fun. That is why this website need a place to keep track of everything and potentially have other people take a look and start their own versions of it or provide feedback. Or, people can even go back to it in the future.

The idea with this particular project (Markdown-to-HTML indexing system) is to have a directory containing all of Sapling's posts and have each file drive a page on the projects section of the website.

Why should Sapling want to have a Markdown system driving the content? Why not just create individual HTML pages for each project? That would have been a totally good solution, maybe even better than creating pages using Markdown. It would have been way less work. However, there are two main reasons I was against that. Firstly, writing a Markdown file is less distracting when producing content, requiring no HTML syntax and could be completely plain-text. This really helps the creating writing process because separation of roles is something to think about as a way to improve a product and method of thinking. It sets the mind in a "creating writing mode" and not "creative writing, plus coding and other things" mode. The second reason (and this is very similar to the first reason, with a very important distinction) is that it just makes sense architecturally. By that, I mean things are modular and compartmentalized. If I had to make a change in a post, either a simple individual change or a templating change across all posts it would not require me to change both at once. Making individual changes would mean changing just a single Markdown file, while changing the way posts are displayed would mean code changes, with no Markdown changes. This follows the popular MVC (Model-View-Controller) architecture, which is dated but something I still really believe in. I think this pattern is way more "natural" and solves a lot of problems that people don't even think of, including for future additions that I may make that I haven't even thought of in the future. That is the most important thing when designing an architecture: future proofing for the many unpredictable long-tail items that might not have been accounted for during design. Something that I'm always reminded of is the effect of abestos cancer claims on the insurance industry almost a generation after it was widely used as construction insulation material. A good thing to look into may be whether folks knew the dangers of abestos when it was widely used, or even if people had a suspicion that it was "wrong".

The technology stack used for Sapling Corporation is NextJS, so I had to find a Markdown solution that was compatible with it. I found an open-source project called remark, which is a pretty big, versatile tool related to processing HTML. This was suggested by the NextJS people, and there is actually a very great guide written by them that gives you a nice outline of everything you need to do to get it working. One of the trickiest parts of having to make it work with NextJS is working with getStaticProps and getStaticPages to have these pages generate at build time. Okay, this would really be optional, but is a very important thing to do for server performance and cost. It is a big "right" thing to do for this project. Again, I am reminded of asbesto usage's effect on the insurance industry.

With Remark, this open-source project gives us a way to transform Markdown content to HTML. This HTML can then be injected directly as part of a template using React's dangerouslySetInnerHTML function. This, as the function name suggests, is dangerous if the input is not sanitized, but since it is converted from Markdown, there really should be no way to attack the page unless there is a vulnerability with Remark's Markdown-to-HTML function itself. This may be a good thing from someone or myself investigate in the future.

As an aside, Jekyll is a leading candidate for dynamically generated blog content. It works very similarly to what I wanted to implement, a bunch of Markdown files that will generate posts. StackOverflow, which serves millions of users every day, even wrote a blog post on the benefits they found after transitioning from their in-house blogging solution to Jekyll. However, after some research, I found that integrating Jekyll with a NextJS system wasn't really that straightforward. If you're interested, do a Google search for Jekyll and NextJS to learn more about the various difficulties. Make a comment below if you found it interesting or have insights on this problem, as I would like to know more myself.

Back to integrating Remark, the goal is to have Markdown content be taken in and served. The first step is to have the Markdown content taken in. Something to consider was how my files and folder was going to be structured. For now, I am using a single directory with a numbering system where each file is named with the projectId, then dot MD. There is probably be a better way to do this using a mapping system, but for now, with the amount of time I want to put into this, this is a good way to do this. The downsides to my simple approach is that the potential to take advantage of file titles is ignored. I would think project titles would be a natural file title. This may present some difficulties down the road, again alluding to the not knowing what future constraints will be brung up.

The purpose of having a projectId is to be able to fetch it when it is entered into the URL. Having project ids prevents having to generate links exclusively using encoded project titles which would be very ambiguous. The projectId is just a simple number, that is unique. The project title and the page itself would spawn from this projectId. The projectId will come in from user input in the url.

When the user enters the projectId into the URL, it will take that id, first check if it's a number, perform other sanitization, then fetch that id dot MD file's contents. The files content is then given to Remark. Remark will generate the HTML string and then it will be injected to serve to the client. If the project id doesn't exist, it will simply serve a 404. Keyword "simply," another tenet of development. That's it, done.

We are using NextJS to build out Sapling Corporation. The decision to use NextJS was a deliberate one. NextJS is quickly becoming a popular framework to generate websites because it works well and has just enough optimization features to make it highly desirable without it being bloaty. Also, as a reason of personal interest, it is what I use in professional development, so I'd like to learn more about it while building out Sapling Corp and writing about it.

One of those highly desirable optimization features that comes with NextJS is its ability to serve static pages that were dynamically rendered at build time. In other words, it keeps cached copy of pages to serve. Build once, serve permanently. This is an unbelievably huge optimization in reducing server costs and improving performance. I may look into what the actual numbers are in the future, or if anyone knows please let me know by leaving a comment. This static generation feature happens through NextJS's getStaticProps function. And as a special case of having dynamic paths (we have a url that takes in project ids), we also need to use the getStaticPaths function. I think of this as generating the keys and values of the statically generated sites cache kept on the server. The paths are the keys, the props are the values. And if you think about it, props are all we need to generate the page if we have a template.

Using getStaticProps and getStaticPaths functions were not straightforward to me initially. I still don't fully understand it. However, the way I think of these is that they basically fetch the 'key' of the cached hardcoded props we need to generate the page, if we're using the analogy I set in the previous example.

With getStaticPaths, we need the values of potential GET parameters where we want to trigger serving the static content. For example, if we had the page projects and that takes in a projectId parameters, we can use getStaticPaths to differentiate between projectId=100 to serve certain props and prop values and projectId=50 to have something as well, while maybe projectId=10 just won't cut it and give you a dynamically generated example.

So where do I put them and why? I decided to put it in the pages file because the Next JS team put it there in their example and also, makes a lot of sense to put in the earliest part, nearest to the entry point. This way, the information can flow down to required components, and will not have the complications of traversing through parents, siblings, and other potential family members - which does happen and becomes an issue surprisingly frequently.

Pages is a Next JS feature that was introduced recently, in version 9. This says two things. The fact that they included it so far back in the development probably means it was originally architected and was kludged in. In C++, generics (also known as templates) was also a feature that was put in way later, so it does happen. They are still introducing new feature into C++ to make things easier, like smart pointers, although with a distinct version iteration. Back on topic, however, the Next JS team probably made this sacrifice because they saw the potential of pages and how important it would be for the framework. To put it as simple as possible, pages basically handles routing through using filenames as paths. So, creating a file named 'project' would have the effect of creating the /projects route. This is where I presume parameters and information gathering should occur, at least in the majority of cases.

Some things I had to do other than fetching the static props was to sanitize props coming in through the url, because anything passed by the user is suceptible to an injection attack. It is possible NextJS does this already but I do it myself just in case before using it. To sanitize the projectId variable, I simply check if it's a positive number using the parseInt function.

Using the projectId, I turn it back into a string, directly suffix the ".md" to it, and look for that specific file name in the project-pages directory. This is a MVP iteration, so it is okay for now. It's important to keep things simple and remember it's okay to do things that do not scale. This is something Paul Graham stresses in one of his better written essays. We just want to see that it works and potentially spend more time on it at a future date. To make this scale, we should name the file something else, then look up the filename using the projectId. This provides another level of abstraction which would help with security and how this architecture is designed.

One of the last hanging questions I had was: how do I handle dynamic slugs when using getStaticPaths? The problem is that every time you specify a path, we need to also specify specific parameters, whether they're additional optional variables or not. For example, projects/1/hello would be different than projects/1 in terms of what the function returns or identifies as a static path. I ended up posting a question on StackOverflow and received a nice answer suggesting the use of rewrites to remove optional, superfluous URL GET parameters before it gets to the pages coroutine. I will have to work on this soon.

I want title to be ignored because for static paths to work, it needs to be specific and not dynamic. So, we can't give an arbitrary title for a project whether we made a typo or for optimization/SEO reasons unless we do code changes - and that is something I would like to avoid.

As of now, this is where I left it off. Here are some notes to possibly revisit in the future.

  • May need to use redirect, not rewrites - No. Use rewrites.
    • Because redirect will physically alter url, rewrite won't? No.
  • API to return title, and other project contents