You must provide a string to the re
library if you want to manipulate an object using a string pattern. If you try to use a string pattern on an object which is stored using the “bytes” data type, you’ll encounter the “TypeError: cannot use a string pattern on a bytes-like object” error.
This guide talks about what this error means and why you may encounter it. We’ll walk you through an example of this error so you can see what steps to take to resolve the error.
TypeError: cannot use a string pattern on a bytes-like object
Bytes objects contain a sequence of single bytes. They are immutable, like strings, which means they cannot be changed. A bytes object is typically returned when you read a binary file, or when you use a library like “request” to retrieve data from a website.
When you are using the re
library, you must work either using bytes or objects. You cannot specify string patterns for a bytes object, and vice versa.
If you are working with bytes data, your program must specify a regex pattern in bytes. If you are using regex with strings, provide a string-based regex pattern.
An Example Scenario
We’re going to write a program that retrieves the title of a web page. We work with the Career Karma website for this tutorial.
To start, let’s import the two libraries we’ll need to build our program: urllib and re.
import urllib.request import re
The urllib
library lets us make web requests and the re
library gives us the ability to use regex in our program.
Next, we make a web request to the Career Karma homepage:
with urllib.request.urlopen("https://careerkarma.com") as res: home = res.read()
The program retrieves the contents of the Career Karma homepage. This data is read using the read()
method that is part of the urlopen()
method. We store this data in the “res” variable.
Now that we have the data from our program, use the search()
method to find out the contents of the <title> tag on the web page that we have queried. This tag contains the title of a web page.
To find the title of the web page, use the re.search()
method:
pattern = "<title>(.*?)</title>" find_title = re.search(pattern, home) print("The title of the web page is: {}.".format(find_title))
Our program will search for the contents of the <title> tag. Our program then prints the title of the web page to the console. We use the .format() method to add this title into our string.
Let’s run our program and see if it works:
Traceback (most recent call last): File "main.py", line 9, in <module> find_title = re.search(pattern, home) File "/usr/lib/python3.8/re.py", line 201, in search return _compile(pattern, flags).search(string) TypeError: cannot use a string pattern on a bytes-like object
Our program fails to execute fully.
The Solution
The value of “home” (the response from our web page) is a bytes object but the pattern we use to find the title of a web page is a string. This causes an error because we cannot match string patterns against bytes objects.
There are two ways we can solve this problem.
Solution #1: Convert String Pattern to Bytes
We have to convert the string pattern we use to a bytes object. We can do this using either the “b” keyword or the bytes()
method:
pattern = b"<title>(.*?)</title>" pattern = bytes("<title>(.*?)</title>")
The first method of using the “b” keyword is more common because it is easier to read. Now that we’ve converted our string pattern to bytes, we can run our code:
The title of the web page is: <re.Match object; span=(4037, 4124), match=b'<title>Career Karma - Discover the Best Career Ad>.
Our code returns the text that matches our query.
Now that we have the regex response, we could parse it so that it appears just as a string in our code. Parsing regex data is outside of the scope of this tutorial.
Solution #2: Decode Web Page Data
Alternatively, we could opt to decode our web page data to make it a string. This is useful if you expect a string for other parts of your code to work.
We can decode our web page data by modifying the line of code where we open the web page:
with urllib.request.urlopen("https://careerkarma.com") as res: home = res.read().decode("utf-8")
This code will decode the response from our web request so that we can treat the response like a string. You should replace “utf-8” with the method of encoding the web page that you are requesting uses.
We can then use a string pattern to search for the title tag. There is no need to convert our pattern to a bytes object because “home” will be a string value.
pattern = "<title>(.*?)</title>" find_title = re.search(pattern, home) print("The title of the web page is: {}.".format(find_title))
Let’s run our code and see what happens:
The title of the web page is: <re.Match object; span=(4037, 4124), match='<title>Career Karma - Discover the Best Career Ad>.
Conclusion
The “TypeError: cannot use a string pattern on a bytes-like object” error is raised when you try to match a string pattern to an object which is stored using the bytes data type.
"Career Karma entered my life when I needed it most and quickly helped me match with a bootcamp. Two months after graduating, I found my dream job that aligned with my values and goals in life!"
Venus, Software Engineer at Rockbot
You can fix this error either by converting your string pattern to a bytes object, or by converting the data with which you are working to a string object.
Now you’re ready to fix this Python error like a pro!
About us: Career Karma is a platform designed to help job seekers find, research, and connect with job training programs to advance their careers. Learn about the CK publication.