A Guide to SQL and NoSQL Historians

SCADA systems rely on historians - data hubs that archive site operations. Choosing the right historian software can be tricky. Nor-Cal explores the two main historian options, SQL and NoSQL, to help with selecting the perfect fit for your project's data needs.

Historian Database Types

It is recommended that any site large enough to use a SCADA system include either a stand-alone or virtual historian. This local historian is considered a vital component of the SCADA rack and is used to log the site’s data. While many of these sites might also use an off-site or cloud-based historian, this does not mean that the on-site historian can or should be discarded. That off-site system will need the local one to pull data from. It is also always possible that an act of God may cause off-site connections to go down, at which point the on-site data would be able to act as a back-up. Additionally, the on-site system can be used for in-depth analytics and reporting that can be vital to the function of the plant.

Given the importance of the historian in the function and efficiency of a plant, it is important to select one that is most capable of handling the site’s needs. The type of data that is logged in to the historian, the frequency with which it is collected, and the time it will need to be retained are all factors that are determined based on each individual site’s needs. The historian software that handles this information will be selected based on those requirements and a few other factors.

Generally, when selecting a historian, there are two types of software languages to consider: SQL or NoSQL. Unlike proprietary process historian software, these open-source historians allow a project owner greater customization and flexibility in meeting their site’s needs.

How they work

Fundamentally, SQL and noSQL are programming languages used for database management systems (DBMS) used to organize and store information. These systems are vital to data analysis and trending. Both languages are popular for their ease of use and relatively flexible design.

SQL (Structured Query Language) is a programming language used with relational databases in which data is stored in rows and tables. SQL is a non-procedural language, meaning it uses functional language and simplified semantics and can return any datatype or value. This makes it more accessible for non-programmers to be involved in customizing it to a site’s individual requirements, although making a query will still require some programming knowledge.

As the data is stored in an SQL historian, it is linked to provide relationships and context and then information can be queried. Typically, SQL databases can only scale data vertically when organizing the data collected, as horizontal scaling is not well supported. SQL components include databases, tables, queries, constraints, stored procedures, and transactions. This structure makes for a database that is often considered easy to use and manage but is less flexible in its design.

The term “NoSQL” is used to refer to both non-SQL databases and databases that use SQL in addition to other structures and languages. NoSQL is a non-relational database that allows for data to be stored in structures other than rows and tables. It can often support SQL programming language, but it is not limited to that structure. This structure is not relational and allows for non-traditional storage and querying and better lends itself to horizontal scaling.

Because NoSQL can be more varied in its syntax, it is often not as simple for non-programmers or laymen to be involved in the programming and set-up process. Fortunately, many historian programs that utilize NoSQL have been designed to be user friendly to allow for better collaboration and ease of use. The less rigid structure of NoSQL historians can also make it easier to add new fields and points as necessary.

Scalability and Data Compression

SCADA servers often handle a large variety of data points from different kinds of hardware across the plant such as inverters, trackers, MET stations, and battery storage systems. They may collect points data as often as every one second and can often be expected to retain that data on-site for a year or more. As such, the ability to scale and compress data can be an important aspect to consider.

As established, SQL databases do not lend themselves to horizontal scaling and are most often only vertically scaled. This provides less flexibility in the way data is stored, which means that data does not compress as much in SQL historians. These historians can still collect data for long periods of time, but they will need more storage to do so. This can add to the cost and take up space in the SCADA rack.

Meanwhile, the less rigid set-up of NoSQL and the ability to scale horizontally means that it is better able to increase load and compress data. Because of this, NoSQL databases can offer an advantage for high data loads in terms of data compression. This means that NoSQL databases often require less storage than strictly SQL historians.

For smaller sites, sites with fewer data points collected, or sites that require shorter data retention, this may not factor into the historian type as much. But, for larger sites and sites that collect or retain more data points, NoSQL may be a better option.

Data Visualization

The context and insights that can be provided from the data collected in the historian can play an important role across a site. The data can be used to understand and optimize the efficiency of a site, to track weather trends, to determine maintenance schedules, and much more. Historians can play an important role not just in collecting and organizing data, but also in trending and visualization.

SQL historians generally offer at least basic trending, but many programs need to work with other programs and tools to provide the visualization that can present the most valuable insights. Additionally, they often require a programmer to handle any queries and then translate the data into a more useful format.

NoSQL historians may offer a wider range of built-in data visualization tools to provide information as needed. Because of these tools, many reports, trends, and data pulls can be set up to run automatically and are translated into useful information by the program without the need for a programmer.

A SCADA rack will often come with built-in analytics tools, data visualization software, and comprehensive, custom-designed HMIs. The right historian will be used alongside these tools to smoothly operate a plant and run reports.

IT and Cost

Of course, when working as part of a larger system, there are other considerations that need to be taken into account. The budget on a project is always going to be a key factor. Cost vs. function will need to be considered when selecting what is best for a project.

The added flexibility, built-in tools, and greater expertise needed to program or create the software for a NoSQL historian means that those programs tend to be more expensive, at least for the upfront cost. The simplicity that comes with an SQL historian tends to result in less expensive software. However, that may not always pay off in the long run. An SQL historian might end up requiring additional storage or the hiring of more experts to work with it, which can add to the overall cost through the life of a plant.

While it would be simpler to only consider a single plant in the SCADA design, that often cannot be the case. It is important to remember that the plant being constructed will most likely be a part of a larger utility structure. Therefore, the historian selection must consider the existing IT structure and what software it may be working with outside of that single site.

If the local historian is linked to a larger system and off-site historian, it may be better to match the existing programming languages and historians used across other sites. If a single company is managing multiple sites, it will be much easier to integrate the new site into the company’s system if it is using similar software and programming languages. In these cases, it will be better from an IT perspective to keep to that existing structure as much as possible.

Whether to select an SQL historian software, such as Microsoft’s SQL, or a NoSQL software, such as Canary Lab’s Historian, can be decided by any of the above factors. It all depends on what is best for the site or most important to the client. Generally, SQL is the preferred historian for smaller or more simplified sites and the inverse is true for NoSQL. That may vary based on the points made in this post.

For example, a large site storing two years of data may still use SQL as the historian based on the site owner requirements. In a case like this, the owner may already have programmers available to make any queries necessary, and they will simply request the SCADA be supplied with a suitable amount of storage space to meet their needs.

On the other hand, a small site can still use a NoSQL historian. The owner may have it as the standard for their systems, or they could simply prefer it for other reasons. What is most important is to understand the requirements of each software and to make sure the best selection is being made for each site.

Let’s ensure your facility operates at peak efficiency. Contact our team to discuss and explore your project’s requirements.