Python has four data collection types: lists, tuples, sets, and dictionaries. The collections Python module provides additional options, including namedtuple
, Counter
, defaultdict
, and ChainMap
.
Python offers four collection data types: lists, tuples, sets, and dictionaries. Each of these data types is useful in specific situations.
For example, because lists can be modified, you may want to use one to store an evolving list of student names. Or suppose you want to store a list of ice cream flavors that will never change. A tuple, whose contents cannot be modified, may be more appropriate.
Often, when you’re working in Python, you may find that these data types do not offer all the features you’re looking for. Luckily, there is a Python module that you can use to access more advanced features related to collections of data: the Python collections module.
The Python collections module was created to improve the functionality of built-in collection options and to give developers more flexibility when working with data structures. In this guide, we will break down the basics of the Python collections module and explore four of the most commonly used data structures from the module.
Collections Refresher
Collections are container data types that can be used to store data. As discussed earlier, collections can store lists, sets, tuples, and dictionaries. Each of these data types has its own characteristics.
Lists
A list is an ordered, mutable data type that can be used to store data that may change over time. For example, you can add, remove, and update existing list items. Lists can contain duplicate values. You can use index numbers to reference individual items within a list.
The following is an example of declaring a list in Python:
sandwiches = ["Cheese", "Ham", "Tuna", "Egg Mayo"]
Tuples
Tuples are ordered and immutable data types. Although tuples can contain duplicate values, their values cannot be changed. Tuples are surrounded by curly brackets.
Here’s an example of a Python tuple:
sandwiches = ("Cheese", "Ham", "Tuna", "Egg Mayo")
Sets
Sets are unordered lists. They are declared using square brackets. Unlike lists, sets do not have index values and cannot include duplicate entries.
Here’s an example of a Python set:
sandwiches = {"Cheese", "Ham", "Tuna", "Egg Mayo"}
Dictionaries
Dictionaries are unordered, changeable data types that can be indexed. Each item in a dictionary has a key and a value.
Here’s an example of a Python dictionary entry:
sandwich = { "name": "Cheese", "price": 8.95 }
These four data types have a wide variety of uses in Python. However, if you’re looking to perform more advanced actions with Python container data types, the Python collections module is worth considering.
Collections Module
The Python collections module contains a number of specialized data structures that you can use in addition to—or as an alternative to—Python’s built-in containers. Because collections
is a module, we have to import it into our program. However it is built into Python, so we do not need to import secondary libraries.
In this article, we will focus on the four most commonly used data structures from the collections module. These are as follows:
- Counter
- namedtuple
- defaultdict
- ChainMap
Counter
Counter()
is a subclass of the dictionary object and can be used to count hashable objects. The Counter()
function takes in an iterable as an argument and returns a dictionary.
So, let’s say that we have a list of sandwich orders for January and want to know how many BLT sandwiches we sold during that month. We could use the Counter()
function to do this.
Here’s an example of the code we would use:
from collections import Counter sandwich_sales = ["BLT", "Egg Mayo", "Ham", "Ham", "Ham", "Cheese", "BLT", "Cheese"] our_counter = Counter(sandwich_sales) print(our_counter["BLT"])
Our program returns: 2
.
There is a lot going on in our code, so let’s break it down.
On the first line, we import the Counter
function from collections
. We have to do this because collections
is a module. Then, we declare our sandwich_sales
array, which stores how many sandwiches we sold in January.
On the next line, we declare the our_counter
variable and assign it the Counter(sandwich_sales)
function. This allows us to access the result of the Counter()
function when we reference our_counter
.
Finally, we use print(our_counter[“BLT”])
to print out how many sandwiches in our dictionary are equal to BLT
. In this case, the answer was 2
.
namedtuple
The namedtuple()
method returns a tuple with names for each position in the tuple. When you’re working with a standard tuple, the only way you can access individual values is by referencing the tuple’s index numbers. If you’re working with a big tuple, this can quickly get confusing.
Here’s an example of using the namedtuple()
method to store a sandwich’s name and price:
"Career Karma entered my life when I needed it most and quickly helped me match with a bootcamp. Two months after graduating, I found my dream job that aligned with my values and goals in life!"
Venus, Software Engineer at Rockbot
from collections import namedtuple Sandwich = namedtuple("Sandwich", "name, price") first_sandwich = Sandwich("Chicken Teriyaki", "$3.00") print(first_sandwich.price)
Our program returns: $3.00
.
There’s a lot going on in our code, so let’s break it down. On the first line, we import namedtuple
from the collections
module so that we can use it in our code.
On the next line, we create the Sandwich tuple with the name Sandwich
, and assign it two headers: name and price. This allows us to use these headers to reference the values in our tuples later on in our code. Next, we declare a variable called first_sandwich
, which is we assigned the tuple item Chicken Teriyaki
.
Finally, we print out the price of our first_sandwich
, which in this case is $3.00.
You can also create a namedtuple()
using a list. Here’s an example:
second_sandwich = Sandwich._make(["Spicy Italian", "$3.75"]) print(second_sandwich.name)
Our program returns: Spicy Italian
. In this example, we use _make
in addition to our Sandwich
item to denote that we want to turn our list into a namedtuple()
.
defaultdict
The defaultdict()
method can be used to create a Python dictionary that does not throw a KeyError when you try to access an object that does not exist. Instead, if you reference an object that does not exist, the dictionary will return a predefined data type.
Here’s an example that uses the defaultdict()
method to declare a dictionary that will return an str
if we reference a non-existent object:
from collections import defaultdict sandwiches = defaultdict(str) sandwiches[0] = "Ham and Cheese" sandwiches[1] = "BLT" print(sandwiches[1]) print(sandwiches[2])
Our program returns:
BLT // This is a blank line
In the above example, we created a dictionary with values at index positions 0
and 1
. When we print out sandwiches[1]
, we can see that our dictionary stored our values. However, when we try to print out the item associated with the index value 2
, our program returns a blank line because there is no value assigned to that index.
In a standard dictionary, our program would return a KeyError. However, because we used defaultdict
, our program instead returns the data type we specified when we created the dictionary. In the above example, we stated that any invalid key should return an str
, but we could have coded it to return an integer or any other valid data type.
This function can be useful when you’re working with a dictionary to perform an operation on multiple items but the operation may not work on each item. Instead of causing your program to return an error, the defaultdict()
will return a default value and keep running.
ChainMap
The ChainMap()
method is used to combine two or more dictionaries; it returns a list of dictionaries. For example, let’s say that we have two menus—a standard menu and a secret menu—that we want to merge into one big menu. In order to do this, we could use the ChainMap()
function.
Here’s an example of using ChainMap()
to merge our standard and secret menus:
from collections import ChainMap standard_menu = { "BLT": "$3.05", "Roast Beef": "$3.55", "Cheese": "$2.85", "Shrimp": "$3.55", "Ham": "$2.85" } secret_menu = { "Steak": "$3.60", "Tuna Special": "$3.20", "Turkey Club": "$3.20" } menu = ChainMap(standard_menu, secret_menu) print(menu)
Our code returns a ChainMap object that merged our two menus together, as follows:
ChainMap({'BLT': '$3.05', 'Roast Beef': '$3.55', 'Cheese': '$2.85', 'Shrimp': '$3.55', 'Ham': '$2.85'}, {'Steak': '$3.60', 'Tuna Special': '$3.20', 'Turkey Club': '$3.20'})
We can access each value in our ChainMap by referencing its key name. For example, here’s a line of code that allows us to retrieve the price of the BLT sandwich:
print(menu["BLT"])
Our program returns: $3.05
In addition, it’s important to note that ChainMap updates when the dictionaries it contains are updated. So, if you change a value in the standard_menu
or secret_menu
dictionaries, the ChainMap object will also be updated. Here’s an example:
print(menu) standard_menu["BLT"] = "$3.10" print(menu)
Our code returns:
ChainMap({'BLT': '$3.10', 'Roast Beef': '$3.55', 'Cheese': '$2.85', 'Shrimp': '$3.55', 'Ham': '$2.85'}, {'Steak': '$3.60', 'Tuna Special': '$3.20', 'Turkey Club': '$3.20'})
As you can see, the price of our BLT changed from $3.05 to $3.10 because we changed its price in our standard_menu
dictionary.
The ChainMap object also includes two functions that can be used to retrieve the keys or values from an object. We can illustrate this using the keys()
and values()
methods. These methods return the keys of our data (which we can use to reference a particular value) and the values they have been assigned:
print(list(menu.keys())) print(list(menu.values()))
Our code returns the following:
['Steak', 'Tuna Special', 'Turkey Club', 'BLT', 'Roast Beef', 'Cheese', 'Prawn', 'Ham'] ['$3.60', '$3.20', '$3.20', '$3.05', '$3.55', '$2.85', '$3.55', '$2.85']
Our code returned the keys and values of each item in our ChainMap object when we used the keys()
and values()
methods above.
In addition, you can add a new dictionary to a ChainMap object using the new_child()
method. Let’s say that our sandwich chef has been trying out new sandwiches on a test menu and wants to add two of them to our new menu. We could use the following code to achieve this goal:
test_menu = { "Veggie Deluxe": "$3.00", "House Club Special": "$3.65" } new_menu = menu.new_child(test_menu) print(new_menu)
Our code returns an updated ChainMap with our new sandwiches at the start of the dictionary, as follows:
ChainMap({'Veggie Deluxe': '$3.00', 'House Club Special': '$3.65'}, {'BLT': '$3.05', 'Roast Beef': '$3.55', 'Cheese': '$2.85', 'Shrimp': '$3.55', 'Ham': '$2.85'}, {'Steak': '$3.60', 'Tuna Special': '$3.20', 'Turkey Club': '$3.20'})
Conclusion
We can use the Python collections module to extend the built-in collections offered by Python and to access custom data structure methods. This is helpful if you are looking to work with a collection data type, such as a list or a tuple, but need to perform a certain function that does not come in vanilla (or plain
) Python.
In this guide, using examples, we broke down how to use collections in Python and discussed the four main methods offered by the library: Counter
, namedtuple
, defaultdict
, and ChainMap
.
Now you’re equipped with the knowledge you need to start working with the Python collections module like an expert!
Are you curious to know how learning Python can help you break into a career in tech? Download the free Career Karma app today and talk with one of our expert career coaches!
About us: Career Karma is a platform designed to help job seekers find, research, and connect with job training programs to advance their careers. Learn about the CK publication.