Long before the craze of new-age JavaScript libraries, a true RESTful API specification implementation was given to standardize API construction design and practices. At its core, the spec provided insight into what it takes for an API to be considered a REST API. While the standard practice of implementing such APIs persists to this day, much of the original thought behind the specification has shifted to adapt to the ever-changing nature of JSON-consuming JavaScript frameworks and libraries.
Note: This blog will not delve into every detail of constructing and implementing a RESTful API, nor will it handle HTMX from the perspective of implementation and setup. While we believe that a hands-on approach is key to better conceptual understanding, this blog won’t be covering those aspects as it would simply be far too long. Technical implementation examples will be provided in a follow-up blog.
Instead, the blog focuses on one critical constraint often missing in many modern enterprise web applications: HATEOAS (Hypermedia As The Engine Of Application State). HATEOAS is a key principle of REST architecture that allows clients to navigate an API dynamically through hypermedia, ensuring a more adaptable and scalable approach to web development.
But first, what even is hypermedia?
Delving into hypermedia
Hypermedia is a fundamental concept in web development that builds on hypertext by incorporating a variety of media types—such as images, videos, and audio—into the linking structure of web content. Essentially, it transforms static web pages into dynamic, interactive experiences by enabling users to navigate through different types of media seamlessly. Traditionally, hypermedia relies on standard web protocols like HTTP to deliver these interconnected resources to browsers, providing a richer and more engaging user experience.
Despite its potential, many of today's web applications are not fully leveraging hypermedia. Instead, they rely heavily on JavaScript to handle much of the application's functionality and interactivity.
This shift means that while hypermedia principles still underpin the basic structure of the web, the actual user interactions and data handling in modern web apps are frequently driven by JavaScript frameworks and libraries. These tools manage everything from dynamic content updates to complex user interfaces, often bypassing the more traditional hypermedia capabilities that browsers and HTTP inherently support.
This approach has its benefits, such as enhanced interactivity and real-time data processing, but it also means that the web's original hypermedia model is not being utilised to its full potential in many cases.
Understanding HATEOAS
The sole purpose of HATEOAS as a whole is separating concerns between the client and server, whilst allowing a mutual communication channel using a unified language.
With hypermedia in place of the primary engine of a web application, the client side has to know very little about how to interact with the application or about its interactions with the server side beyond having a very generic understanding of hypermedia.
By contrast, using JSON as the primary driver between the two imposes a fixed interface that is usually only shared through documentation (via tools such as Swagger).
If the restrictions imposed by HATEOAS are met, the server is able to evolve its functionality independently. We’ll see why that’s a big deal further down the line.
A HATEOAS example - HTML vs JSON
All of the above may seem a bit vague without a conceptual understanding of the problem at hand. We may talk about pros and cons of each side, but doing so would be unfair due to tendency of people sticking to what they already know, so let’s look at an example below instead.
Suppose we’re creating a bank API that serves the bank account information of individual users to the client. More importantly, upon serving bank account information to the user, we want any subsequent requests to be made to be included in the hypermedia response of the initial request. Using that logic, our server-side response would contain all the information needed to handle any further requests the user can make within the newly loaded page. In this way, RESTful interaction is driven by hypermedia, rather than out-of-band information.
A practical example looks like the following:
This would execute a GET request to the /accounts/{account_id} endpoint, which returns the following hypermedia response:
The above presents exactly what has to be shown to the user. There’s no additional processing or logic required by the client to execute.
Now, suppose the account has been overdrawn, and the user cannot withdraw or transfer money. In such a case, the response would look like the following:
Using the more traditional approach, the account status would’ve been included in the HTTP response upon which client-side logic would show/hide the links accordingly:
With this change in mind, instead of letting hypermedia handle itself as an all-encompassing solution, we’re forced to add additional logic on the client side in order to handle bank accounts in different states.
Such an approach can only cause issues down the line, and to understand why, let’s give ourselves a look at the broader picture.
The modern web development environment
In the current landscape of modern web development, the focus has shifted significantly from traditional server-side processing to more dynamic and interactive user experiences. As a result, the primary emphasis is being put on the client side of applications. That sacred place, the client side, is a playground with a limited set of constraints, allowing users to interact with most of what an application has to offer. Before delving into the issues outlined above, let’s consider an example of data flow for most common enterprise-level web applications (with separate frontend and backend services):
The user executes an action that sends out a request to an API endpoint
A server-side controller (or equivalent) parses the request and performs business logic based on passed parameters
The API returns a previously agreed-upon format of a JSON response
The JSON is consumed and parsed by the client-side JavaScript
An HTML-based page or drop-in replacement is constructed and rendered
This preconceived notion of client-server communication is the golden standard for most applications to date, and it does work, but to what extent?
The real problem - API churn
What has been outlined above is a prime example of API development as a whole, but what’s even more striking is the alignment between frontend and backend development. The current trend of development is very client-side heavy and usually goes something like this:
“This screen contains elements x, y, and z. We need a new endpoint which will perform some calculations and we need it to be returned in the format {x: , y: , z: }“.
By the design of HATEOAS, we should strive to stay away from performing business logic on the client side of things to keep our APIs scalable and well maintained. That’s obviously not the case here. Eventually, such construction leads to APIs tightly coupled to screens and numerous endpoints that continuously require reworks or new ones to be created, all in hopes of filling the desires of the UI.
The issue is not that the API is overloaded; the issue is that the UI needs continuous change. With continuous change come new endpoints, which are usually tightly coupled to their specific screens/workflows. Don’t even let us get started about the API variants for different platforms that need to be covered.
The solution!
Since we cannot bypass the continuous flux of ever-changing UIs, there has to be another way around it, and you’re right—there is! Why don’t we just open the gate on the client side and allow the client to request exactly what it needs from the server? After all, the tooling is already there: GraphQL, a query language for your API. But... is that the right decision?
Well, for some situations, opening up a structured query language to retrieve data from the server within the client isn’t such a bad idea, but with that, a new issue arises—security.
Instead of allowing the server to handle whole client-side logic (as with HATEOAS), we’re depending on the client now to retrieve data from the server and perform logic on its own.
The threat in disguise here is the end-user. Let’s give a brief example; say we have a simple GraphQL query:
With it, if the end-user does find the execution point of the GraphQL query, they may as well alter it to be something like:
Oops. What have we done here? Filtering through HTTP requests, the end-user can find where the GraphQL query is being executed and request additional resources be provided. Do you see where we’re going with this?
Sure, one may argue this with, “But you can add security to each of the GraphQL properties you’re trying to access,” but with the UI's ever-present changes, adding that security will just create unnecessary/difficult-to-deal-with overhead.
With the above approach in mind, there’s always going to be a tradeoff between how much expressiveness you’re willing to give client-side devs and how much of a security headache that will turn out to be. Ideally, if there were a place where the client side could request as much information using a structured query language to build up the ever-changing UI, we would all be using it, wouldn’t we?
Well... such a place does exist! And here is where we make a round turn - it’s the server side! If client devs could access the API directly using SQL to retrieve the data they need and also build a response without additional data parsing on the client, it’d be both secure and scalable (and probably faster, too)!
But is it even viable?
While a shift in the development paradigm and process of implementation is required, the internet has come to the point where most of the gaps or quirks of heavy JS-based frameworks can be covered through a simpler interface without the need for additional JSON processing.
With this approach, we can move beyond JSON and utilise powerful tools like htmx, which significantly enhance HTML. These tools enrich HTML attributes by providing integrated APIs, making data manipulation easier and adding features such as CSS transitions, AJAX, Websockets, and Server-Sent Events—all through HTMX attributes within regular HTML tags. Events and asynchronous behaviour can be managed using ///_hyperscript (created by the same developers behind HTMX). Although this might conflict with the Single Responsibility Principle (SRP), some argue that it improves code readability by handling everything related to a component directly within the HTML, rather than separating it into another file.
This approach makes it excellent to introduce styling via Tailwind CSS, which couples both styling and functionality within the same HTML tags. Everything is done from one file/resource.
Since the goal of tooling such as HTMX is to reduce client-side clutter, the target audience has always been primarily backend engineers wanting a way to construct UIs without learning a whole new system of Javascript-based frameworks and following each of their best practices.
All in all, even if this doesn’t sound promising, we’d advise you to look around and see an ever-growing community of people who think alike. The most promising benefit of this approach is simplicity, and with that comes the freedom to focus on other, harder things, like security (which is often forgotten about or left to be done at last).
Do remember, none of this has been written to bash Javascript or the frameworks alike. It has been written in hopes that someone would find themselves with a new set of tools without sacrificing the need to embed themselves in a highly complex client-side framework and expand their already existing knowledge with powerful tooling ready to be used out-of-the-box.
At last, it is important to acknowledge that perspectives on web development may vary. My colleague and I have had a rather lengthy discussion about HATEOAS, and while there are differences to how we think, each has their own nit-picks. Different methodologies can offer unique advantages depending on the context and specific needs of a project. This simply doesn't imply that one approach is superior to the other; rather, it showcases the diversity of tools and principles available to developers, allowing them to choose the best fit for their objectives.