How to select the right IoT database architecture
Understanding the core components of IoT databases is essential when it comes to choosing the one that's best for a particular IoT initiative.
Organizations have many options to choose from when designing an IoT database, but technologists must decide the best fit by evaluating the different IoT database architectures, such as static vs. streaming vs. time-series and SQL vs. NoSQL.
The right IoT database depends on the requirements of each IoT project. The first step to select a database is to factor in critical characteristics of IoT when selecting among database architectures. IoT technologists must determine the types of data to be stored and managed; the data flow; the functional requirements for analytics, management and security; and the performance and business requirements.
After identifying the organization's requirements for a database, IT admins must assess the IoT database architectures and how they promote or inhibit IoT data needs.
Understand static and streaming IoT database architectures
Start by understanding the fundamental distinction between static and streaming databases. Static databases, also known as batch databases, manage data at rest. Data that users need to access resides as stored data managed by a database management system (DBMS). Users make queries and receive responses from the DBMS, which typically, but not always, uses SQL. A streaming database handles data in motion. Data constantly streams through the database, with a continuous series of posed queries, typically in a language specific to the streaming database. The streaming database's output can be stored elsewhere, such as in the cloud, and accessed via standard query mechanisms.
Streaming databases are typically distributed to manage the scale and load requirements of vast volumes of data. Currently, there are a range of commercial, proprietary and open source streaming databases. These include well-known platforms such as Amazon Kinesis, Apache Kafka, Azure Stream Analytics, Google Cloud Dataflow, IBM InfoSphere Streams and Microsoft StreamInsight. There are also a host of new startups, including Materialize and Rockset.
These platforms might be "pure" streaming databases, which are optimized for real-time decision-making and near-instantaneous latency, but more often, they are unified databases, which include both a streaming component and a static component. The benefit of the static component is that it's based on standard query techniques and schemas. Thus, these unified databases combine the best of both worlds of streaming and static databases because they support both the real-time capabilities of a streaming database and the flexibility of a static database's query process and schema.
For IoT, the best database for most applications is a unified database. Most popular vendors' databases include both types of databases for this reason.
Explore more nuanced database architectures
Time-series databases are, in many respects, based on the same technology as streaming databases but were developed with a slightly different focus. Time-series databases are more tactical. They typically involve implanting specific indexing techniques over NoSQL databases with the goal of enabling high-performance event processing. Streaming databases are more comprehensive, enabling a broader portfolio of data analyses, such as machine learning or windowing. Vendors such as InfluxData, Grafana and Prometheus offer time-series databases, as do larger players, such as Amazon, Google, IBM and Microsoft.
SQL vs. NoSQL?
SQL databases are relational and feature static schemas that describe how the information is organized. This makes them highly manageable. However, they run into issues scaling effectively. NoSQL databases are nonrelational, don't have schemas, and are promoted as highly scalable and better performing than SQL databases.
Some tech professionals might think that a NoSQL database would be the obvious choice because scalability is essential for many IoT uses. But scalability and performance are only two factors that technologists need to consider when selecting databases. A critical factor in many scenarios is ease of integration into existing systems, where SQL is more effective. Many IoT tools and systems assume SQL. This is particularly true in industrial environments that are based on older message protocols or industrial automation platforms.
The ability to create and manage schemas is also a plus. Although technologists might find schema development to be constraining, information must be organized. Putting in the effort to develop schemas up front saves significant effort later to organize data in a nonschema environment.
Organizations might also find combining static and streaming databases challenging when including the choice between SQL and NoSQL. In theory, a static or streaming database could be either SQL or NoSQL. In practice, databases are specifically set to one or the other. IoT technologists interested in a particular unified database might find their SQL vs. NoSQL decision driven by the design of the database.
Whether an organization should choose a SQL or NoSQL database depends on the broader set of functional and technical requirements, particularly scalability, performance and the need to integrate into legacy systems.
Securing the IoT database
Last, but far from least, is the issue of database security. Although it's not a key component of database architecture, understanding the security capabilities and characteristics of the IoT database or databases is critical to preventing security breaches, as was shown by what happened to Chinese company Orvibo a few years back. This includes not just operational security -- e.g., making sure the database password is set and privileges are clearly defined -- but also confirming where and under what circumstances data is encrypted. In motion? In place? Also, organizations should confirm how IoT devices are authenticated to ensure the database isn't inadvertently scooping up malware.