Graph Databases, GraphQL and IAM

Published 11 February 2020

Abstract

The Digital Enterprise requires speed, scale and contextual awareness across increasingly complex and diverse relationships. The Lightweight Directory Access Protocol (LDAP) has been the standard for Identity and Access Management (IAM)-centric enterprise directories for 30 years. LDAP relies on a hierarchical data model that begins with a top-level root entry, then moves to subordinate branches and ends in leaf nodes.

LDAP has traditionally supported security operations by querying authentication and authorization attributes to make informed security decisions. The current and future challenge is the requirement for an increasingly larger array of context signals (identity attributes, devices, location, source, etc.) that, in turn, lead to complex LDAP database structures, and often result in the need to use meta- or virtual-directory solutions.

As a result, vendors have long been investigating the use of non-hierarchical database models – most notably the RDBMS to support scaling directories by attaching it to a highly performant, replication-ready databases The challenge is that relational databases also introduce complexity related to efficiently joining data across numerous rows and tables during runtime authentication and authorization processing.

This, in turn, has led to the investigation of other database alternatives, most notably GraphQL and graph databases. A graph database uses graph structures for semantic queries with nodes, edges, and properties to represent and store data.The “graph” relates the data items to a collection of nodes and edges, with the edges representing relationships. Such relationships allow data in the storage system to be linked together directly and, in many cases, retrieved with one operation.

This report starts by providing a graph database and GraphQL level-set, then evaluates whether this approach has long-term merit as a solid database foundation for IAM solutions in general as well as specific IAM use cases such as Customer IAM (CIAM).

Authors:

Doug Simmons

Principal Consulting Analyst

[email protected]

Archie Reed

Principal Consulting Analyst

[email protected]

Executive Summary

A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph, which relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. Such relationships allow data in the storage system to be linked together directly and, in many cases, retrieved with one operation.

Graph databases have emerged over the past few years as reasonably good alternatives to the rigid schema of hierarchical databases such as LDAP, as well as complex, costly join operations inherent with relational databases. But the graph database, while becoming increasingly popular with social networking solutions such as Facebook, LinkedIn, Twitter and Google in order to maintain complex relationships (e.g., “friends”) among end users, is a relatively new concept. Do graph databases hold the key to large-scale directory services, alongside distributed data and contextual signal sources, in support of IAM?

The answer is “most likely”. Over the course of the next 3-5 years, TechVision expects a newer breed of IAM solutions with graph database underpinnings to begin to overtake the technologies we have used for the past few decades. Our principal recommendation is that you begin to investigate this fascinating way for managing identity data soon. In this way, you will have had a useful indoctrination into the universe of the graph and can better prepare your organization for the next wave of IAM solutions.

For those of you in the throes of developing a new CIAM infrastructure to replace an aging or under-performing platform, we strongly recommend that you prioritize CIAM solutions that incorporate graph technology. For ‘Microsoft shops’, the writing is on the wall; it would behoove you to start your journey with an up-to-date mindset based on where both Microsoft and the IAM industry are headed. In this report, we will describe how access control policies may lend themselves to better management within a graph database.

TechVision Research expects graph database technology to rapidly grow, and in the case of IoT implementations – this growth may be dramatic. There are already some very good tools on the market, so the time may be right to begin thinking about your ‘Next-gen IAM’ solution being built on a graph database foundation. In particular, graph databases are gaining popularity in support of graph-based access control (GBAC), supporting a declarative way to define access rights, task assignments, recipients and content in information systems. The access rights are granted to objects like files or documents, but also business objects like an account. Compared with role-based access control (RBAC) and attribute-based access control (ABAC), GBAC has so far shown to return run-time authorization decisions much faster (some claim more than twice as fast).

Given that runtime access controls have been a challenge for those responsible for information security for the past few decades, we believe it is a good time for most enterprises to familiarize yourself with graph database and GraphQL technology, bring some flavor of this in-house and ‘experiment with it in a sandbox’. Perhaps a small ‘tiger team’ can be formed in order to build some meaningful expertise in the use of this technology for IAM, whether consumer focused (CIAM), for improved access control policy management or for IoT scenario testing.

Introduction

The Lightweight Directory Access Protocol (LDAP) has been the industry standard for Identity and Access Management (IAM)-centric enterprise directories for almost three decades. Today, it would be difficult – if not impossible, to find an organization that does not rely on LDAP for user (and device) authentication and authorization. To make this point even stronger, consider that Microsoft Active Directory and Azure Active Directory have been built on the LDAP model since inception. Having worked in countless customer organizations for the past thirty years as IAM consultants and architects, we can assure you that LDAP has become one of the most pervasive subsystems in the history of IT.

Derived from the International Standards Organization’s 1988 X.500 Directory Services model, LDAP relies on a hierarchical database model that begins with a top-level root entry and branches off into subordinate branches and ends in leaf nodes. One of the principal challenges with using a hierarchical structure, or namespace, for directories is that the schema and namespace itself often need to change in concert with business focus, organizational changes – including mergers and acquisitions, and the general evolution of computing and IAM itself.

Adding to the problem, as the LDAP directories in many enterprise IAM systems have grown in size and complexity, they become slow or less responsive. When this happens, directory performance (or lack thereof) can impact the performance of every application that depends on them.

While the hierarchical database model used by LDAP has endured, vendors have been investigating the use of non-hierarchical database models – most notably the relational database almost since the inception of LDAP. The attractiveness is that relational databases provide highly performant, replication-ready support that can also serve applications via Source Query Language (SQL). An additional benefit of SQL is because it may be an area that an enterprise’s in-house skills may already be abundant. However, relational databases retain their own level of complexity related to efficiently joining data across numerous rows and tables during runtime authentication and authorization processing. The need for scale and performance without the complexity of relational databases has led to further investigation into other database alternatives, most notably graph databases and GraphQL. A key issue the industry is addressing, and the focus of this paper is whether graph databases may be a better alternative to the traditional hierarchical LDAP or relational SQL structures.

These discussions have been going on the past several years and graph databases are beginning to emerge as a reasonably good alternative to the rigid schema of hierarchical databases such as LDAP. TechVision consistently recommends flexibility in future-state IAM strategies and rigid schemas are not consistent with this goal. That said, the graph database, while becoming increasingly popular with social networking solutions such as Facebook, Netflix, Twitter, LinkedIn and Google to maintain complex relationships (e.g., “friends”) amongst end users at scale, is a relatively new concept. The question we are looking to answer is if graph databases hold the key to large-scale directory services, alongside distributed data and signals sources, in support of IAM? The following sections examine graph database technology and compare and contrast this with hierarchical and relational databases in support of scalable IAM solutions.

The Graph Database

So, what is a graph database? A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept is the graph, which relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. Such relationships allow data in the storage system to be linked together directly and, in many cases, retrieved with one operation. The concept of the graph database is illustrated below, courtesy of Neo4j, a leader in graph database technology.

Figure 1: Graph Database Concept

Graph databases hold the relationships between data as a priority. Querying relationships within a graph database is fast because the data relationships themselves are perpetually stored within the database. Furthermore, data relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

Graph databases portray the data as it is viewed conceptually. This is accomplished by transferring the data into nodes and its relationships into edges. Figure 2 below illustrates the node and edge relationships in the graph database.

Figure 2: Nodes and Edges in the Graph Database

A graph within graph databases is a set of objects, either a node or an edge – defined as follows:

Nodes represent entities or instances such as people, businesses, accounts, or any other item to be tracked. They are roughly the equivalent of a record, relation, or row in a relational database, or a document in a document-store database.
Edges, also termed graphs or relationships, are the lines that connect nodes to other nodes; representing the relationship between them. Meaningful patterns emerge when viewing the connections and interconnections of nodes, properties and edges. The edges can be either directed or undirected. In an undirected graph, an edge from a point to another has one meaning. In a directed graph, the edges connecting two different points have different meanings depending on their direction. Edges are the key concept in graph databases representing an abstraction that is not directly implemented in either a hierarchical or relational model.
Properties are germane information to nodes. For example, if TechVision Research were one of the nodes, it might be tied to properties such as website, research documents, or words that starts with the letter T, depending on which aspects of TechVision Research are germane to a given database. The concept of nodes, edges and properties are further illustrated below.

Figure 3: Graph Database Drilldown

It is important to note that the underlying storage mechanism of graph databases can vary. Some implementations utilize a relational database to store the graph data in a table (note that a table is a logical element – meaning this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored).

Other graph database implementations use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. An example of a NoSQL database that utilizes this method is ArangoDB, a native multi-model database that supports graphs as one of its data models. It stores graphs by holding edges and nodes in separate collections of documents. A node is represented like any other document store, but edges that link two different nodes hold special “linking” attributes inside each document.

Data lookup performance is dependent on the access speed from one node to another. The concept of index-free adjacencyis important to understanding how graph databases work. Graph databases lookup adjacent nodes in a graph via a direct walk of memory (i.e., pointer hopping) – which currently is the fastest way computers can look at data relationships. Therefore, graph databases utilize direct physical RAM addresses for each node in the graph. Each node’s RAM address is a pointer that is created when data is loaded into the graph database – not when the data is queried. This means there is no need of an index (or many indices) to lookup data relationships – they are hard-coded within each node.

Figure 4: Index-free Adjacency Means Hard-coding Data Relationships

As a result, index-free adjacency usually requires the nodes to have direct physical RAM addresses and physically point to other adjacent nodes to enable extremely fast data retrieval. Native graph databases use index-free adjacency to process create, read, update and delete (CRUD) operations on the stored data. A native graph system with index-free adjacency does not have to move through any other type of data structures to find links between the nodes. Directly related nodes in a graph are stored in the cache once one of the nodes are retrieved, making the data lookup even faster than the first time a user fetches a node. However, such advantage comes at a cost: index-free adjacency sacrifices the efficiency of queries that do not use such graph traversals.

As discussed, graph databases are part of the NoSQL databases created to address the limitations that exist with relational databases. While the graph model explicitly lays out the dependencies between nodes of data, the relational model and other NoSQL database models link the data by implicit connections. Graph databases, by design, allow simple and fast retrieval of complex hierarchical structures that are difficult to model in relational systems.

Retrieving data from a graph database requires a query language other than SQL, which was designed for the manipulation of data in a relational system and therefore cannot efficiently handle graph traversal. At present, no single graph query language has been universally adopted in the same way as SQL was for relational databases, and there are a wide variety of systems, most often tightly tied to one product. Some standardization efforts have occurred, leading to multi-vendor query languages like Gremlin, SPARQL, Cypher and GraphQL – discussed in more detail below. In addition to having query language interfaces, many graph databases are accessed through application programming interfaces (APIs).

The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and “store” the graph data in a table. However, a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored. Thus, such an approach can hinder the performance of a graph by imposing this overhead. Others, however, use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. As you might suspect, such non-relational underlying databases are able to function at the ‘true speed of the graph’ – which means, much faster because of the lack of overhead. As previously discussed, most graph databases based on non-relational storage engines also add the concept of tags or properties, which are essentially relationships having a pointer to another document. As we have said, graph databases allow data elements to be categorized for easy retrieval at large scale, which will be particularly important as we evolve toward IoT security/management and Zero Trust infrastructure reliant upon such scale and performance.

While there can be differences below the GraphQL interface, there is generally only one query target. So, unlike REST API’s, GraphQL allows, and in fact requires a client-side call to specify the properties of the query across objects. This is more flexible than an RMDBS or LDAP server and even more so than a strict API, Interface Definition Language (IDL) or strongly defined schema. The approach is as simple as is illustrated below:

Source: https://ldapwiki.com/wiki/GraphQL

Figure 5: GraphQL Queries

Note that graph databases differ from graph compute engines. Graph databases are technologies that are translations of the relational online transaction processing (OLTP) databases. On the other hand, graph compute engines are used in online analytical processing (OLAP) for bulk analysis. Graph databases have attracted considerable attention over the past 20+ years due to the successes of major technology corporations such as Netflix, Facebook, LinkedIn, Amazon and many others using proprietary graph databases, and the introduction of open-source graph databases.

Graph, Hierarchical or Relational?

We discussed previously that while LDAP directories are inherently hierarchical in structure, many in the IAM and directory industry have long been searching for a more high-performance solution as compared to hierarchical tree structure traversing as required by LDAP. The most notable alternative up until the semi-recent arrival of largescale graph databases has been the relational database. In this section, we’ll briefly investigate the hierarchical and relational database models and examine their fit-for-purpose vis a vis a scalable enterprise, customer or IoT directory foundation. We’ll start with the traditional hierarchical database model supporting LDAP.

Hierarchical LDAP Database

The Lightweight Directory Access Protocol (LDAP) has been the industry standard for Identity and Access Management (IAM)-centric enterprise directories for almost three decades. Derived from the International Standards Organization’s 1988 X.500 Directory Services model, LDAP relies on a hierarchical database model that begins with a top-level root entry – called the parent and branches off into subordinate branches and ends in leaf nodes – all referred to as ‘children’ of the parent root entry. This hierarchy is illustrated below.

Figure 6: Hierarchical Database Conceptual Model

Typically, LDAP directories are “under the covers” of most enterprise directories (including Microsoft Active Directory and Azure Active Directory), and they contain information about the entire workforce/constituency as well as other information of general interest within and between organizations. LDAP directories may also contain the information about customers, partners, suppliers and other “non-employee/contractor” end users that access multiple externally facing applications. In many cases, enterprises combine users from external business partner companies with internal employees and contractors to create an “extended enterprise directory”, while keeping customer identity information in a separate directory instance. The traditional LDAP directory is the “de facto” standard and continues to be extended. A decision that enterprises will ultimately need to make is if they will continue to bolt on additional capabilities on top of LDAP directories or embrace a new approach.

Applications also place content, structure, and distribution requirements on the directory. Fortunately, many directory-enabled applications use industry-standard information object classes (such as persons, groups, roles, and organizational units) and attributes (such as common name, telephone number, and e-mail address) as defined by the LDAPv3 standard inetOrgPerson. Pretty much every directory service on the market today include these schema elements as part of their base schema, so little or no modification is required to support many – but certainly not all, out-of-the-box applications.

An enterprise directory must generally support both standard information objects and be extensible to support new applications or releases of applications as business needs arise. So, while such a standard LDAP schema is necessary, it is often insufficient, or becomes overly complex, to support the full range of enterprise applications…and as we consider the scope of applications and services associated with the Digital Enterprise, this challenge will become even more pressing. This is because organizations also require the ability to define custom objects (such as “parts supplier”) or attributes (such as “cost center”). As a result, individual vendors (e.g., Microsoft) as well as industry consortia (e.g., the IETF) also have produced directory schemas extensions that are useful additions to the base schema and promote interoperability.

These objects are commonly defined through separate data sources and require consolidation into the LDAP directory, resulting in requirements for synchronization, normalization and duplication of data, often achieved through virtual or meta-directories designed to manage these processes. While these can be used to create close to real-time synchronization, the reality is they are more often done as batches, on a timed basis. This can create timing issues, essentially a risk management situation especially for security scenarios that are dependent on up-to-date, non-stale attribute information. Of course, it can also create issues to have continuous queries against native data sources, such as performance impacts or a set of security issues relating to pass through data access.

Finally, divergent content types, performance and application availability requirements depend on the directory to support a flexible structure and to use their innate distribution and replication to accommodate application needs. All of these factors can often add up to a highly complex and potentially error-prone directory environment.

Hierarchical Structuring and Naming

Directories provide hierarchical structuring capabilities similar to file systems, referred to as the “namespace.” The namespace sets boundaries on how the directory tree can be built by defining rules for the length and character strings for names. Organizations may need to structure and name information in the directory to support geographical, organizational, functional, or other relationships.

Naming is a seemingly simple but often elusive process for organizations to define. Most organizations require account names that are unique across all directory tiers and roles, that are easy for workers or customers to remember, and that are interoperable with many applications. Figure 7 below illustrates the hierarchical naming context.

Figure 7: LDAP Hierarchical Naming Context

While Figure 7 is a relatively simplistic rendering of an LDAP namespace, it does accurately reflect most enterprises’ naming structure in that it is relatively ‘flat’ – meaning, the organizational unit (OU)=People subtree is not further broken up into multiple subtrees based on geography or business unit structure. This is because directory naming structures with deep hierarchies can complicate object lookup for runtime authentication and authorization by making names longer and more complex. This often leads to considerably longer search times to authenticate and authorize someone or something. Because the enterprise directory supports runtime (read: real time) events, longer search times are not desirable.

Arriving at a useful and durable directory naming hierarchy requires significant forethought. With LDAP directories, architects can build only one hierarchy that emanates from a single root. Also, each entry can exist in only one container (i.e., OU). These constraints bear several ramifications on the architecture and subsequent performance of the directory.

The Relational Database Option

For almost 40 years now, relational databases have been the de facto standard for large-scale data storage systems. However, relational models require a strict schema definition, and the notion of data normalization enforces limitations on how relationships can be queried. The increasing amount of data needing to be processed in real time became an additional problem posed by the relational model.

Traditionally, databases have been designed with the relational model, where data is normalized to support ACID (Atomicity, Consistency, Isolation, Durability) transactions. ACID defines a set of database transaction properties intended to guarantee validity even in the event of errors, power failures, etc. In the context of databases, a sequence of database operations that satisfies the ACID properties (and these can be perceived as a single logical operation on the data) is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. Data normalization removes any duplicate data within the database. The goal of data normalization is to preserve data consistency. The relational model enforces ACID transactions by separating data into many tables.

Again, relational models enforce heavy data normalization in order to guarantee consistency. One of the key motivations for the relational model’s design was to achieve fast row-by-row access. However, problems arise with when there is a need to form complex relationships between the stored data. Although relationships can be analyzed with the relational model, complex queries performing many join operations on many different attributes over several tables are required, causing additional overhead. This is particularly problematic when supporting runtime IAM services such as user (or thing) authentication and authorization. Just as with hierarchical LDAP databases, a complex underlying data structure hurts performance – in some cases quite considerably.

The Graph Database in Comparison

As opposed to typical relational and hierarchical databases, graph databases are often faster for associative data sets and map more directly to the structure of object-oriented applications. They can scale more naturally to large datasets because they do not typically need join operations, which can often be quite CPU intensive. Because graph databases rely less on a rigid schema, they are marketed as more suitable to manage ad hoc and changing data with evolving schemas.

Figure 8: Relational vs. Graph Database Features

Conversely (when compared to graph databases), relational database management systems are typically faster at performing the same operation on large numbers of data elements, permitting the manipulation of the data in its natural structure.

The relational model gathers data using information in the data. For example, one might look for all the “users” whose phone number contains the area code “415”. This would be done by searching selected datastores, or tables, looking in the selected phone number fields for the string “415”. This can be a time-consuming process in large tables, so relational databases offer indexes, which allow data to be stored in a smaller sub-table, containing only the selected data and a unique key (or primary key) of the record. If the phone numbers are indexed, the same search would occur in the smaller index table, gathering the keys of matching records, and then looking in the main data table for the records with those keys. Generally, the tables are physically stored so that lookups on these keys are fast.

Relational databases do not inherently contain the idea of fixed relationships between records. Instead, related data is linked to each other by storing one record’s unique key in another record’s data. For example, a table containing email addresses for users might hold a data item called “userKey”, which contains the primary key of the user record it is associated with. In order to link users and their email addresses, the system first looks up the selected user records primary keys, looks for those keys in the “userKey” column in the email table (or, more likely, an index of them), extracts the email data, and then links the user and email records to make composite records containing all the selected data. This operation, termed a join, can be computationally expensive. Depending on the complexity of the query, the number of joins, and indexing various keys, the system may have to search through multiple tables and indexes and then sort it all to match it together.

Similarly, using a hierarchical LDAP directory would require the query to be constructed with the exact distinguished name, which contains the specific names of the root, branches and leaf entries (e.g., dc=acme.com, ou=people, cn=tracy white). This level of detail is typically too complex for an application to know for every single search it needs to conduct to authentication and authorize a person or thing. And, the attributes that comprise the branch and leaf entries are often constantly changing due to typical IAM functions such as ‘joiner, mover, leaver’ (JML), a name change, etc., so the distinguished name used as the search string may be inaccurate and ultimately fail to locate the appropriate record. To address this, LDAP directory administrators often create multiple indices that are searchable by applications and in many ways perform a ‘reverse lookup’, by searching on an attribute, such as last name (i.e., surname) – a field that is indexed, and using the index to locate the LDAP entry’s distinguished name that contains all of the user attributes corresponding to the specific last name. Of course, many people have the same last name, so more compute processing is required to find the correct record.

In contrast, graph databases directly store the relationships between records. Instead of an email address being found by looking up its user’s key in the “userKey” column (RDBMS) or email address -whether indexed or not (LDAP), the user record contains a pointer that directly refers to the email address record. That is, having selected a user, the pointer can be followed directly to the email records, there is no need to search the email table or LDAP index to find the matching records. This can eliminate resource intensive and costly join operations. For example, if one searches for all of the email addresses for users in area code “307”, the engine would first perform a conventional search to find the users in “307”, but then retrieve the email addresses by following the links found in those records. A relational database would first find all the users in “307”, extract a list of the primary keys, perform another search for any records in the email table with those primary keys, and link the matching records together. A hierarchical LDAP directory would function similarly: it would search an index for “307”, then perform a search of all the distinguished names that pertain to the outcome of that search, then search each of these records for the matching email address. For these types of common operations, graph databases would theoretically be faster.

The true value of the graph approach becomes evident when one performs searches that are more than one level deep. For example, consider a search for users who have “subscribers” (a table linking users to other users) in the “307” area code: a relational database would first search for all the users with an area code of “307”, then search the subscribers table for any of those users, and then finally search the users table to retrieve the matching users. An LDAP search would happen in much the same way – multiple searches to narrow down the data being queried. In contrast, a graph database would search for all the users in “307”, then follow the backlinks through the subscriber relationship to find the subscriber users. The graph avoids several searches, look-ups, and the memory usage involved in holding the temporary data from multiple records needed to construct the output. The relative advantage of graph retrieval grows with the complexity of a query.

Relational and hierarchical databases are very well suited to flat data layouts, where relationships between data is one or two levels deep. For example, an accounting database might need to look up all the line items for all the invoices for a given customer, a three-join query. An LDAP directory would see this as a two-level query – one to find the customer under the root (e.g., dc=acme.com) and one to search the invoice database for all invoices corresponding to the customer’s user ID as retrieved from the LDAP directory. On the other hand, graph databases are aimed at datasets that contain many more links. They are especially well suited to social networking systems, where the “friends” relationship is essentially unbounded. These properties make graph databases naturally suited to types of searches that are increasingly common in online systems, and in big data environments. For this reason, graph databases are becoming very popular for large online systems like Facebook, LinkedIn, Google, Netflix, Amazon, Twitter and similar systems with deep links between records.

It is critical to note however, that having a flexible schema creates more burden on the application or client side, as it must have a clear understanding of what data is being used and what the relationships mean. Further, the ability to more flexibly define data and its relationships means that changes to those definitions can significantly impact a call to act on that data. That means the applications can become tightly coupled to the specific GraphQL implementation while at the same time, not having a defined semantic model of the data they are using. Therefore, despite the graph databases’ advantages and recent popularity over relational databases, it is recommended the graph model itself should not be the sole reason to replace an existing relational database. A graph database may become relevant if there is an evidence for performance improvement by orders of magnitude and lower latency. We have always said “there are horses for courses”, so graph databases might not fit the bill for overly complex and often-changing identity data relationships.

GraphQL

GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. GraphQL was developed internally by Facebook in 2012 before being publicly released in 2015. In 2018, the GraphQL project was moved from Facebook to the newly established GraphQL Foundation, hosted by the non-profit Linux Foundation.

GraphQL provides an approach to developing web APIs and has been compared with REST and other web service architectures. It allows clients to define the structure of the data required, and the same structure of the data is returned from the server, therefore preventing excessively large amounts of data from being returned, but this has implications for how effective web caching of query results can be. The flexibility and richness of the query language also adds complexity that may not be worthwhile for simple APIs. It consists of a type system, query language and execution semantics, static validation, and type introspection.

GraphQL supports reading, writing (mutating), and subscribing to changes to data with real time updates – most commonly implemented using WebHooks). Major GraphQL clients include Apollo Client and Relay. GraphQL servers are available for multiple languages, including Haskell, JavaScript, Perl, Python, Ruby, Java, C#, Scala, Go, Elixir, Erlang, PHP, R, and Clojure.

Graph Databases, GraphQL and IAM

What does it all mean? Consider that the challenges of IAM and directory services just may be perfect applications of graph technology. To use any other approach—purpose-built, out-of-the-box or otherwise—appears to be choosing an inferior solution for crucial technology that resides at the core of most enterprise applications. As we are presently witnessing an explosive increase of networked services and resources with inclusion of Zero Trust models, Consumer IAM (CIAM) and the Internet of Things (IoT) waves now commencing, we have to wonder if IAM systems are up to the task of securing the billions of new end users and devices expected to come online in the next few years.

A graph database may be right IAM repository solution for a variety of important reasons. For example, by choosing a graph approach to IAM, enterprises could:

Handle organizational changes easily in one place and have them automatically affect an entire organization and its systems;
Describe all people, entities and resources fully using graph’s rich relationship and metadata models;
Include employees, partners, customers, suppliers, and outside services and resources to enable secure management of the extended enterprise;
Build directories of any size—even with billions of parties and resources—that use graph structures to maintain responsive scale;
Create complex, densely connected, access-control structures, approval chains and workflows;
Define and maintain any combination of hierarchical and non-hierarchical organizational and approval structures.

Even with enormous, highly connected IAM datasets of entities and resources, native-graph query engines can traverse millions of relationships per second to maintain application performance and user productivity.

Runtime Access Control

Graph databases are gaining popularity in support of graph-based access control (GBAC), supporting a declarative way to define access rights, task assignments, recipients and content in information systems. The access rights are granted to objects like files or documents, but also business objects like an account. Compared with role-based access control (RBAC) and attribute-based access control (ABAC), GBAC has so far shown to return run-time authorization decisions much faster (some claim more than twice as fast).

Since graph database technology allows an application (or administrator) to query relationships in any direction, enterprises can use it to perform a variety of top-down and bottom-up IAM queries such as:

Which applications can a specific user access?
Which users are permitted to access a specific application?
Which resources—products, services, documents, etc.—can a specific user access or an administrator manage?
Who can modify its settings?

With the advent of IoT “things” that are coming online, we’ll need the ability to manage billions of new identities and their defining attributes and contexts. This includes device-related data such as:

Owner(s)
Other identities that may or may not have access to device data or functions
Other services that manage telemetry data to or from devices

The writing is on the wall for a faster, relationship centric underlying database technology. For example, how many authentications and authorizations per second will a typical present-day IAM system be able to handle when billions of things attempt to authenticate from around the world? Make no mistake, we will soon be dealing with “big (IAM) data”, where a graph database backend makes a lot of economic and performance sense.

The high performance of graph based IAM solutions can theoretically turn the often seconds and minutes required by hierarchical and relational data stores into millisecond response times. Such speed makes graph-based IAM particularly applicable for applications with large audiences, many resources, and complex connections—including social networks, customer portals, content management, document systems and federated services.

One leading graph database-focused IAM vendor, Nulli, had this to say at the Identiverse conference: “As we expand our horizons across ZT, CIAM and IoT, access rights and related policies will become more and more intricate. Access policies to any given application, service, network or device are currently determined not only by business requirements, but also by legal and regulatory compliance. It will become necessary to evaluate the relationships that people, networks, applications and devices have with one another.”

Traditional SQL or LDAP backend systems have a difficult time modeling or representing these relationships and will struggle under real-time IoT environments. As we have shown, graph databases have proven to be the logical backend solution to support these relationship-based metaphors. Graph databases can handle the immense volume of relationship data while addressing the latency challenges presented by LDAP and SQL. In this paradigm, graphs represent the access policies themselves, and applications can query these policies through an API such as GraphQL, Cypher and so forth.”

This model becomes increasingly important as we recognize that our existing access policies may not be relevant tomorrow. With the evolution of business models, regulations and the underlying technologies, there is a way to improve the extensibility of access management systems so that such access policy evolution has a minimal impact on the IAM service. However, because IAM solutions currently enable non-standard RESTful APIs in order to expose IAM services such as authentication and authorization, any new functionality typically requires yet another API or changes to the existing ones. This situation obviously often impacts all the applications, systems and devices that rely on the modified APIs.

However, because GraphQL isn’t actually tied to any specific backend data store, the graph database server exposes one single REST endpoint that can service any query. The GraphQL client running in the IAM subsystem issues GraphQL queries and makes sense of the responses sent by the server. A significant advantage of this model is that one can easily modify the server schema and run queries on it right away without deploying new REST endpoints or modifying existing APIs. Modeling access policies in graphs and implementing the access requests through GraphQL can provide improved IAM flexibility and performance without sacrificing security. For more details on GBAC uses cases and configuration, we recommend Neo4j’s GBAC tutorial at https://neo4j.com/graphgist/entitlements-and-access-control.

In summary, there is an element of versioning that commonly occurs with data management queries. Whether LDAP, RDBMS or REST API type queries, changing the schema of a data source can break a client. GraphQL has addressed the problem of API versioning and maintenance by forcing clients to specify exactly which fields they require. API developers can proactively reach out to known consumers of fields to migrate away from deprecated fields. The response includes information about which fields are deprecated.

Vendors of Interest

There is much happening in the IAM market with respect to graph database and GraphQL adoption. In this section, we highlight a small but meaningful set of vendors who are either “all in” or appear to be heading in that direction. We will continue to update our research clients as momentum builds (or diminishes) in this space.

Microsoft

When Microsoft acquired LinkedIn in 2016, it was evident to us that this was a big play to establish the massive LinkedIn member base as a key part of Microsoft’s ‘enterprise’ customer base. Microsoft quickly began using the social network’s relationship graph as a massively complex data set that links millions of people (nodes) to each other (edges).

LinkedIn is a huge NoSQL graph database that uses a schema-less approach to manage semi-structured data. Each node in the graph is an individual, with all his or her profile data. Each node is linked to others, tens or hundreds for people with a few connections, thousands for highly connected individuals. Queries traverse those connections, letting you find all the people you know working on IAM, or who are based in Denver, or who used to work at Goldman-Sachs, for example.

Microsoft has called the Microsoft Graph the company’s “most important” bet. There are a lot of data in the Microsoft Graph, with tools both for consumer information and for business information. Elements associated with Microsoft accounts, like the new Activity Stream and the Device Graph, are the basis for device-roaming features like the Continue on My PC tools recently released for iOS and Android (similar to Apple’s iCloud account-based Handoff capability in iOS), and which Microsoft is encouraging Universal Window Platform (UWP) developers to build into their code.

In addition to the Microsoft Graph and LinkedIn, Microsoft is supporting other graphs with APIs:

Dynamics 365 has the Common Data Service, a way of describing standard items in a business. With the Common Data Service, organizations can extend a standard schema with their own models of a customer or products.
CosmosDB, which builds on a JSON document database with different API sets, including one for developing and managing organization-specific graph databases at scale.
Microsoft’s Security Graph is used to assess and manage threats, exposed to enterprise applications through tools like Azure Active Directory’s conditional-access feature.

Microsoft is also enabling its customers to use graph queries across multiple graphs to extract useful data that can help drive business decisions. The ability to query the edges of a graph, rather than on a specific node, assists enterprise customers in understanding the relationships between nodes.

Microsoft is offering an alternative to traditional database-driven decision-support tools through extensive use of graph database technology. With all of these tools, enterprise customers can make complex cross-graph queries focusing on not just than individual nodes in those graphs but also on the edges (links) between nodes.

As an example, this is being exposed in the Bing for Business tool that adds information from a corporate Active Directory and other sources to Bing searches when a user is logged in to an Azure Active Directory account. Results are dynamically generated from Microsoft Graph queries that return details of, for example, where someone is in the organization chart, along with related content from the wider web and from documents they may have shared internally. Future releases will bring in more of Microsoft’s graphs, providing conditional access features and exposing external relationships via LinkedIn.

ForgeRock (with Nulli)

Core to the ForgeRock Identity Platform is the design and implementation of access control policies. In its most common form, an access control rule is specified by subjects, objects, permissions, and conditions; and lets the AM system determine whether a user (subject) is able to perform an operation on a resource (object) under the current condition. In partnership with IAM integrator Nulli, ForgeRock has developed a simple-to-deploy access management solution where ForgeRock’s web access manager, OpenAM is used to protect resources. Users’ authentication history is utilized to control access to those resources. To access a protected application or service, a user is required to be authenticated to OpenAM. The OpenAM extension consists of two plugins: a post-authentication plugin (PAP) and a policy condition plugin. The former stores users’ authentication history and the latter uses this data to make informed decisions on granting access to protected resources. The following provides a high-level description of the two pieces:

OpenAM PAP is invoked after the user completes an authentication process. Nulli provides a plugin that updates the authentication history of a user based on their authentication result – which can be a failure, a successful login, or a logout. The authentication history graph is bootstrapped with OpenAM registered users and a calendar. Upon a (successful or failed) login event, a new authentication node is created and is located between the user and the past authentication nodes. The new node is also linked to the calendar as well as a client node which embodies client information such as IP, agent, etc. Upon a logout event, the relative authentication node (created at login) will be updated. Below is an illustration from the graph database of the recent authentication events by a hypothetical user named “Aaren”:

Figure 9: ForgeRock and Nulli Using the Graph Database

This solution has also implemented a conditional plugin which queries the graph database to return policy advice based on the user’s client which can be of type: “trusted”, “private”, “public”, or “adversarial”.

From this perspective, access control policies are naturally seen as graphs that connect users, resources, and conditions through permission decisions. Graph databases therefore provide a unique opportunity for expressing and evaluating access policies. According to ForgeRock, there are two obvious advantages to using graph:

Graph databases excel at querying connected or adjacent data from known starting points. In the access management realm, the starting point and the resource being requested is also known. Graph theory allows the intermediary relationships (in this case policy) to be traversed and evaluated quickly even in complex scenarios.
Graph interfaces provide a unique way of viewing data; they allow for a very high level of visual inspection. Given the appropriate graphical interface, this will help security administrators better comprehend access control policies, check consistency in policy updates, and avoid possible conflicts.

Neo4j

Neo4j is marketed as “a highly scalable native graph database, purpose-built to leverage not only data but also data relationships.” Using Neo4j, developers can build applications that traverse large, interconnected datasets in real time. Powered by a native graph storage and processing engine, Neo4j is intended to deliver “an intuitive, flexible and secure database for unique, actionable insights”.

The Neo4j database is marketed with the following key characteristics:

Performance – native graph database for provides real-time performance for multi-hop queries on large, interconnected/distributed datasets.
High availability – Raft-based casual clustering, rolling upgrades and hot backups are supported out-of-the-box, the graph database is built for 99.999% uptime, 24×7.
Agility – Neo4j’s property graph model is easily adapted to business processes and changes.
Security – supports LDAPv3 directory services, security event logging and role-based access control (RBAC).
Developer friendly – fully supports Cypher and GraphQL.
Scalable – minimal server footprint required.

Neo4j feels that access control and authorization solutions powered by graph databases are particularly applicable in the areas of content management, federated authorization services, social networking preferences and software as a service (SaaS) offerings, where they realize minutes-to-milliseconds increases in performance over their relational database predecessors.

As an example, they discuss a customer, Telenor Norway – an international mobile network operator. For several years, it has offered its largest business customers the ability to self-service their accounts. Using a browser-based application, administrators within each of these customer organizations can add and remove services on behalf of their employees. To ensure users and administrators see and change only those parts of the organization and the services they are entitled to manage, the application employs a complex identity and access management system which assigns privileges to millions of users across tens of millions of product and service instances.

Below is an example of Telenor’s data model. Due to performance and responsiveness issues, Telenor decided to replace its existing IAM system with a graph database solution. Their original system used a relational database, which used recursive JOINs to model complex organizational structures and product hierarchies. Because of the join-intensive model, their most important queries were unacceptably slow. In contrast, once they implemented a graph database solution, Telenor realized the performance, scalability and adaptiveness necessary for handling their identity and access management needs, reducing queries that once took many minutes to milliseconds.

Figure 10: Case Study: Telenor Data Structure

Recommendations

We have discussed how graph database technology is spreading in use within large-scale identity data systems such as social media networks. Graph databases are also gaining attention from the mainstream IAM community as an alternative to enterprise, customer or IoT directories built on LDAP or relational databases.

Does that mean you should chuck your existing IAM solution? Of course not – if it is working for you, that’s fantastic! That said, be aware that this change is likely coming. Over the next 3-5 years, TechVision expects a new breed of IAM solutions coupled with graph database underpinnings to begin to overtake (or be integrated with) the mainstream directory and IAM products/services we have used for the past few decades. Our core recommendation is that our clients begin to investigate and understand how this updated approach to managing identity data at scale is unfolding and how it might impact your organization. At a minimum, this should prepare your organization for the next wave of IAM solutions that will require scale and performance while managing increasingly complex relationships. We believe the scale and management of relationships are key as enterprises move towards becoming Digital Enterprises.

In particular, graph databases are gaining popularity in support of graph-based access control (GBAC), supporting a declarative way to define access rights, task assignments, recipients and content in information systems. The access rights are granted to objects like files or documents, but also business objects like an account. Compared with role-based access control (RBAC) and attribute-based access control (ABAC), GBAC has so far shown to return run-time authorization decisions much faster (some claim more than twice as fast). For those of you in the throes of developing a new CIAM infrastructure to replace an aging or under-performing platform, we strongly urge you to consider CIAM solutions incorporating graph technology. For ‘Microsoft shops’, the writing is on the wall, and it would behoove you to start your journey with an up-to-date mindset of where both Microsoft and the IAM industry in general are headed. In this report, we have seen how access control policies, in particular, may lend themselves very well to management within a graph database.

We hope you have found this report useful. TechVision Research feels that graph database technology usage will only increase, and in the case of IoT implementations – it may increase quickly and dramatically. There are already some very good tools on the market, so the time is right to begin thinking about your ‘Next-gen IAM’ solution being built on a graph database foundation. In this light, we encourage you to familiarize yourself with graph database and GraphQL technology by bringing some in-house and ‘playing with it in a sandbox’. Perhaps a small ‘tiger team’ can be formed in order to build some meaningful expertise in the use of this technology for IAM, whether consumer focused (CIAM), for improved access control policy management or for IoT scenario testing.

About TechVision

World-class research requires world-class consulting analysts and our team is just that. Gaining value from research also means having access to research. All TechVision Research licenses are enterprise licenses; this means everyone that needs access to content can have access to content. We know major technology initiatives involve many different skillsets across an organization and limiting content to a few can compromise the effectiveness of the team and the success of the initiative. Our research leverages our team’s in-depth knowledge as well as their real-world consulting experience. We combine great analyst skills with real world client experiences to provide a deep and balanced perspective.

TechVision Consulting builds off our research with specific projects to help organizations better understand, architect, select, build, and deploy infrastructure technologies. Our well-rounded experience and strong analytical skills help us separate the “hype” from the reality. This provides organizations with a deeper understanding of the full scope of vendor capabilities, product life cycles, and a basis for making more informed decisions. We also support vendors in areas such as product and strategy reviews and assessments, requirement analysis, target market assessment, technology trend analysis, go-to-market plan assessment, and gap analysis.

TechVision Updates will provide regular updates on the latest developments with respect to the issues addressed in this report.

About the Authors

Doug Simmons brings more than 25 years of experience in IT security, risk management and identity and access management (IAM). He focuses on IT security, risk management and IAM. Doug holds a double major in Computer Science and Business Administration.

While leading consulting at Burton Group for 10 years and security, and identity management consulting at Gartner for 5 years, Doug has performed hundreds of engagements for large enterprise clients in multiple vertical industries including financial services, health care, higher education, federal and state government, manufacturing, aerospace, energy, utilities and critical infrastructure.

Archie Reed Over 25+ years, Archie Reed has a career spanning many evolutions of the technology industry, from identity management to cloud computing, from machine learning to DevOps security. He worked on the early development of standards in OASIS, IETF and other standards groups, he introduced the early concepts of “Context Based Identity Management” which formed the basis of many identity-based security solutions and was engaged in the early evolution of the Cloud Security Alliance. Archie has authored a number of well-referenced books including “The Definitive Guide to Identity Management” and “Silver Clouds, Dark Linings: The Executive Guide to Cloud Computing”.

Tags:

Graph Databases, GraphQL and IAM

Abstract

Executive Summary

Introduction

The Graph Database

Graph, Hierarchical or Relational?

Hierarchical LDAP Database

Hierarchical Structuring and Naming

The Relational Database Option

The Graph Database in Comparison

GraphQL

Graph Databases, GraphQL and IAM

Runtime Access Control

Vendors of Interest

Microsoft

ForgeRock (with Nulli)

Neo4j

Recommendations

About TechVision

About the Authors

Architecture

Artificial Inteligence

Identity

Privacy

Security

We can help

Thanks, we'll be in touch!

Stay in the know!

Congrats! We'll be sending you updates on the progress of the conference.