Python Tutorial Part 6 | Data Structures: Sets

Wrapping up the data structures portion of our journey are sets. This is the least frequently used data structure, but don’t write off the set just yet. Its efficiency and simplicity may provide value in manufacturing data processing.

The set data structure in Python is one of the more niche structures that is built into the language. These structures are most similar to mathematical sets, and they are somewhat more abstract than the other data structures we have covered. However, their strong affinity to logic provides some benefits. They are more efficient than lists and can rapidly be used to obtain differences and similarities between groups.

What is a Set in Python?

A set is a data structure that is mutable, in other words, objects can be added or removed. However, the objects themselves must be immutable. This means that you will see sets primarily composed of the primitive data types we discussed earlier such as integers, floats, and strings. Sets also do not store the order or place of objects in the set, thus shuffling the order of objects. Lastly, sets cannot contain duplicate objects.
To define a set, curly brackets are used and objects are separated by commas. Let’s take an abstracted example of fruit below:
Let’s print the set variable and observe what happens to the order of fruits:
The order is not the same as how we defined the set in the fruits variable. Furthermore, let’s try adding another apple to the set to test how the set responds:
The set filtered the input for distinct or unique fruit instances.
A set using fruit types

Figure 1. Fruit in control automation?! An abstract image to illustrate a set in Python.

Set Methods

I’ll ditch the fruit example in favor of something more applicable to what a controls engineer may encounter in real life. Does the mountain of Excel files with alarm data sound familiar? Days, weeks, and possibly even months of files accumulated that must now be analyzed to target efforts on improving an automation asset.
This example will not be one-for-one with an Excel file, but the general concept will hold true. Let’s look at alarms, denoted with a universally unique identifier for a single day of the week contained in a set assigned to a “monday_alarms” variable:

Modules – Quick Aside

If the syntax of what just happened above looks funny, that’s because the example imports a Python module. A module or library is almost like a package that contains a product a manufacturer already built for you. It saves you time, all you have to do is open the package and put the product to work. In this case, the uuid module contains the .uuid4() method or function that generates random UUID. Brief explanation aside, we will explore modules in further detail in a later section.

Adding an Alarm

The machine alarms in real time and updates the monday_alarm set:
To more clearly denote the change, the .len() method of a set denotes the total number of elements. We can obtain a count before and after adding an alarm:

Removing an Alarm

An alarm was found to be illegitimate upon data processing, and the specific UUID needs to be removed:

Intersection

Just like Venn diagrams in elementary school, we can find common elements amongst 2 sets. Let’s say we wanted to find common alarms that the machine emitted across sets from both Monday and Tuesday:
Note that we can use either the .intersection() method or the & operator with the sets.

Union

Finding the union or merged set from the Monday and Tuesday alarms yields:
Note that even though the sum of alarms between Monday and Tuesday is 6, the resultant set removes duplicate alarms. The result only contains unique alarm UUIDs between the 2 days.

Difference

The difference indicates what alarms were present on Monday, but not on Tuesday:

Symmetric Difference

The symmetric difference also takes into account which alarms were present on Tuesday that were not on Monday:

Wrapping Up With Data Structures

We have now quite exhaustively covered data types and structures in Python, great job for sticking around and learning! The next sections (forthcoming) will cover loops and control of flow.


Copyright Statement: The content of this website is intended for personal learning purposes only. If it infringes upon your copyright, please contact us for removal. Email: [email protected]

Leave a Comment