Jakob's Blog coding for fun and growth

The struggle with API design

GraphQL could safe many software projects from failure because it reduces the total workload. With examples from my own work experience, I explain why I think like that.

In this post, I am writing about the benefits of using GraphQL that have a positive effect on the development process of software projects. There may also be technical aspects to it but I will not cover them for now.

I discuss this with one year of developer experience with RESTful APIs. I have worked on both ends, the service-provider end, and the service-consumer end. However, I have not worked with GraphQL in professional projects, yet. Therefore, I am mostly presenting prediction, how I think GraphQL would have helped our team, given the basic GraphQL knowledge I gained from talks, blogs, and one of my hobby projects.

What is GraphQL?

GraphQL is an open standard for APIs that evolved from efforts at Facebook. It is now maintained by the GraphQL Foundation which belongs to the Linux Foundation.

The standard defines a language that also includes its own type system, and it is not bound to any specific platforms.

If this is the first time you hear about GraphQL, I suggest you have a look at the introduction on the official GraphQL website before reading on, or it might be difficult to follow the examples I present.

Problems with RESTful APIs

Now, I want to tell you about some problems I faced at some point during my work. Problems that are related to API design that I think would be solved nicely with a GraphQL API.

If you want to read about the general pros and cons of GraphQL, which are not really covered in my post, I can recommend this article by Weblab Technology at Medium.com.

Problem 1: Using one interface for different customers is messy

In this first example, the client was all-in for the digitalization of their business. Our company was the new player and we had to integrate our system into what had already been in place.

Another company had a REST-API that provided a part of the data that we needed to other clients of them. But we only needed a subset of the fields served on the existing interfaces, plus, we also required some additional fields. Our client asked their partner to extend their API for that purpose.

The result was that these additional fields were served in a new interface specifically designed for us, one for every major entity, because they did not want to add more fields to the existing REST-endpoints. Ergo, there were two separate requests necessary to read all fields of an entity.

Also, we had to read more fields than we needed on the old interfaces. For some entities, these were over a hundred fields, including personal data that is protected by privacy laws.

Solution: With a GraphQL API, this mess could have been avoided quite easily. Adding more field to an already existing GraphQL interface does not affect current users. And the redundant fields would never be sent to us because we could specify the fields we want in the query.

Problem 2: Nested queries are sequentially executed

Assume a RESTful API that allows querying personal data of customers. For example, this is how a lookup for the person with ID 1001 looks:

GET https://my-api.sample/person/1001 HTTP/1.1
Accept: application/json
...

And this would be the response

{
  "displayName": "Patrik Muller",
  "company": 2001,
  ...
  }

Now consider this basic requirement. Given a set of customer IDs, display a table with their names and their companies cities.

To receive this data, it is necessary to query for the company and then for the address of the company.

GET https://my-api.sample/company/2001 HTTP/1.1
...
{
  "companyName": "Brown's Chocolate Factory",
  "address": 3001,
  ...
  }

Followed by:

GET https://my-api.sample/address/3001 HTTP/1.1
...
{
  "street": "Main St",
  "city": "London",
  ...
  }

These are three sequential requests that cannot run in parallel, only to display two simple fields:

Patrik Muller, London

When I was working with similar APIs, our team had two main issues with such sequential request.

  • High latency. (One full RTT per sequential request).
  • If the API provider is not stable, any of the three requests could fail and error-recovery-handling becomes that much more complicated.

Solution: Using a GraphQL API, one single request is sufficient.

GraphQL query:

{
  person(id: "1001") {
    displayName
    company {
      address{
        city
      }
    }
  }
}

GraphQL response:

{
  "data": {
    "person": {
      "displayName": "Patrik Muller",
      "company":{
        "address": {
          "city": "London"
        }
      }
    }
  }
}

Problem 3: Manually written documentation

Who has never had the problem that the specification document has not been updated? Or that it simply contained errors from the beginning?

Manually updating the specification of all the fields of each REST-endpoint is time-consuming and many developers hate doing it. Unless some tools automate this, errors are guaranteed to happen.

There are of course tools like Swagger that streamline the API developing process. If you can use this in your team, great, then this problem does not affect you. However, at least from my experience, there are still plenty of projects that are not using such tools.

Solution: With GraphQL, an interactive, live, and auto-generated documentation is given for free. In the GraphiQL browser interface, it is super easy to quickly see all possible queries of any GraphQL API and their details.

Problem 4: API Design pitfalls

This last problem is a bit less straight-forward. But I think it also the most interesting one presented here.

In my opinion, some pitfalls are naturally avoided when using GraphQL to design an API. Mostly because the relations between entities are expressed so much better. But let us have a look at an example.

Consider this HTTP request:

GET https://third-party-api.sample/person/1001/address/2007-02-22 HTTP/1.1
...

It requests an address of a person. 1001 is the ID of the person and 2007-02-22 is the date from which onwards the address has been valid.

This is a slightly simplified version of a real RESTful interface that has been provided to my team about a year ago. The data model deliberately used a compound primary key (PK) for addresses, consisting of the two fields person_id and valid_from.

This design basically allowed only one address per person at the time, which was what the had client asked for.

In practice, when a user changed an address in their system, internally that created a new address with a new valid_from value and the old data row had been kept for the records. (Changes on the same day had been handled differently but this is not relevant here.)

For our project, we had to read in all addresses and keep a copy in our own database. Software users would then be allowed to change the address in either the preexisting system or in the new system that has been connected to our database.

The old system remained the master-system and thus we had to make sure changes made in the new system were delegated to the old system before persisting them. To make this possible, the API also provided a service to modify an address, using the same resource path that has already been presented and a version field made sure no two writes collide.

PUT https://third-party-api.sample/person/1001/address/2007-02-22 HTTP/1.1
{
  "street": "Main St",
  "city": "London",
  "version": 1,
  ...
}

So far, I have just described the API and its use. At this point, everything seemed to be manageable to us.

But then, when testing our implementation, we came to realize, that in their system, under some circumstances, it is possible to modify the valid_from field of an existing address, without creating a new address. Basically, they modified the PK of an existing entity.

Internally, the master system did not have a problem with that. But for us, when we tried to read or write an entity that we had seen before, the interface suddenly answered 404 Resource not found.

If you now think, well then just look up what is the currently valid address of a person by querying the person as shown in the Patrik Muller example earlier, you are out of luck. The API did not provide a way to query that, it had not been designed to query for addresses in this way.

This problem arose because the RESTful API was basically just a direct view into their database. And the team responsible for it did not see it as a bug in their API. Their interface correctly served the specified subset of their database. The rest was up to my team to figure out.

It was indeed a major problem. We had no way to know which was the new primary key of an address that we have seen before.


Because of this issue and a number of similar API flaws, the project managers and developers of both teams ended up wasting a lot of time in meetings to figure out possible solutions which can be implemented quickly and cost-efficiently.

Unfortunately, I would have to explain too many steps and provide a lot more information about the system in order to describe the solution we ended up with. So, I will just leave it at the problem statement and not go into more technical details.

Also unfortunate, simply translating the RESTful API to a GraphQL API is not going to solve this problem. Such an API design that exposes entities under mutable PKs is simply broken. But here is why I believe a GraphQL based API might have helped anyway.

Solution: In a GraphQL API, to leverage the power of graph queries, all relations between entities are expressed using type fields. Most likely, an entity like person would have a field for its addresses.

A natural way of designing a unique address per person and date would be the following.

type Person {
  id: ID!
  address(valid_from: Date!): Address!
  ...
}

This can still result in a resource not found error when the valid_from field has been updated internally. But now I can think of a number of easy fixes.

  • On the back-end of the interface, look for the address that has been valid at the date specified with valid_from, even if the date does not match exactly.
  • Make the valid_from argument optional and serve the most recent address when no argument is provided.
  • Remove the valid_from argument and only make the most recent address of a person accessible over the interface.

The first approach requires no change of the API at all, and the other proposed changes are relatively small as well.

Admittedly, all these solutions are also feasible in a REST based design if the initial person resource already includes the address field. My point here being, when designing an API with GraphQL, relations between entities are much more naturally mapped which makes the design more flexible. With a RESTful approach, it is easier to run into problems that are hard to fix.

Conclusion

To summarize my points, a GraphQL API reduces the total workload significantly, compared to RESTful APIs.

A single GraphQL is more powerful and precise than a traditional plain HTTP request. This saves time for developers using the API.

GraphQL APIs also tend to be much more flexible and therefore allow for easier extension than APIs following the REST concept. This can save a lot of time for everyone in the team of a complex software project.

Technology Stack