Picture by Creator
In Python, strings are immutable sequences of characters which can be human-readable and sometimes encoded in a particular character encoding, equivalent to UTF-8. Whereas bytes signify uncooked binary information. A byte object is immutable and consists of an array of bytes (8-bit values). In Python 3, string literals are Unicode by default, whereas byte literals are prefixed with a b
.
Changing bytes to strings is a typical activity in Python, notably when working with information from community operations, file I/O, or responses from sure APIs. This can be a tutorial on the right way to convert bytes to strings in Python.
1. Convert Bytes to String Utilizing the decode() Technique
Probably the most simple approach to convert bytes to a string is utilizing the decode()
methodology on the byte object (or the byte string). This methodology requires specifying the character encoding used.
Be aware: Strings shouldn’t have an related binary encoding and bytes shouldn’t have an related textual content encoding. To transform bytes to string, you should utilize the
decode()
methodology on the bytes object. And to transform string to bytes, you should utilize theencode()
methodology on the string. In both case, specify the encoding for use.
Instance 1: UTF-8 Encoding
Right here we convert byte_data
to a UTF-8-encoded string utilizing the decode()
methodology:
# Pattern byte object
byte_data = b'Hey, World!'
# Changing bytes to string
string_data = byte_data.decode('utf-8')
print(string_data)
It’s best to get the next output:
You’ll be able to confirm the information sorts earlier than and after the conversion like so:
print(kind(bytes_data))
print(kind(string_data))
The info sorts must be as anticipated:
Output >>>
<class 'bytes'>
<class 'str'>
Instance 2: Dealing with Different Encodings
Generally, the bytes sequence could include encodings aside from UTF-8. You’ll be able to deal with this by specifying the corresponding encoding scheme used if you name the decode()
methodology on the bytes object.
Right here’s how one can decode a byte string with UTF-16 encoding:
# Pattern byte object
byte_data_utf16 = b'xffxfeHx00ex00lx00lx00ox00,x00 x00Wx00ox00rx00lx00dx00!x00'
# Changing bytes to string
string_data_utf16 = byte_data_utf16.decode('utf-16')
print(string_data_utf16)
And right here’s the output:
Utilizing Chardet to Detect Encoding
In follow, chances are you’ll not all the time know the encoding scheme used. And mismatched encodings can result in errors or garbled textual content. So how do you get round this?
You need to use the chardet library (set up chardet utilizing pip: pip set up chardet
) to detect the encoding. After which use it within the `decode()` methodology name. Right here’s an instance:
import chardet
# Pattern byte object with unknown encoding
byte_data_unknown = b'xe4xbdxa0xe5xa5xbd'
# Detecting the encoding
detected_encoding = chardet.detect(byte_data_unknown)
encoding = detected_encoding['encoding']
print(encoding)
# Changing bytes to string utilizing detected encoding
string_data_unknown = byte_data_unknown.decode(encoding)
print(string_data_unknown)
It’s best to get an identical output:
Error Dealing with in Decoding
The bytes
object that you just’re working with could not all the time be legitimate; it could generally include invalid sequences for the desired encoding. It will result in errors.
Right here, byte_data_invalid
accommodates the invalid sequence xff:
# Pattern byte object with invalid sequence for UTF-8
byte_data_invalid = b'Hey, World!xff'
# strive changing bytes to string
string_data = byte_data_invalid.decode('utf-8')
print(string_data)
Whenever you attempt to decode it, you’ll get the next error:
Traceback (most up-to-date name final):
File "/residence/balapriya/bytes2str/essential.py", line 5, in
string_data = byte_data_invalid.decode('utf-8')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec cannot decode byte 0xff in place 13: invalid begin byte
However there are a few methods you may deal with these errors. You’ll be able to ignore such errors when decoding or you may change invalid sequences with a placeholder.
Ignoring Errors
To disregard invalid sequences when decoding, you may set the errors you may set errors
to ignore
within the decode()
methodology name:
# Pattern byte object with invalid sequence for UTF-8
byte_data_invalid = b'Hey, World!xff'
# Changing bytes to string whereas ignoring errors
string_data = byte_data_invalid.decode('utf-8', errors="ignore")
print(string_data)
You’ll now get the next output with none errors:
Changing Errors
You’ll be able to as properly change invalid sequences with the placeholder. To do that, you may set errors
to change
as proven:
# Pattern byte object with invalid sequence for UTF-8
byte_data_invalid = b'Hey, World!xff'
# Changing bytes to string whereas changing errors with a placeholder
string_data_replace = byte_data_invalid.decode('utf-8', errors="change")
print(string_data_replace)
Now the invalid sequence (on the finish) is changed by a placeholder:
Output >>>
Hey, World!�
2. Convert Bytes to String Utilizing the str() Constructor
The decode()
methodology is the most typical approach to convert bytes to string. However you too can use the str()
constructor to get a string from a bytes object. You’ll be able to cross within the encoding scheme to str()
like so:
# Pattern byte object
byte_data = b'Hey, World!'
# Changing bytes to string
string_data = str(byte_data,'utf-8')
print(string_data)
This outputs:
3. Convert Bytes to String Utilizing the Codecs Module
One more methodology to transform bytes to string in Python is utilizing the decode()
perform from the built-in codecs module. This module offers comfort features for encoding and decoding.
You’ll be able to name the decode()
perform with the bytes object and the encoding scheme as proven:
import codecs
# Pattern byte object
byte_data = b'Hey, World!'
# Changing bytes to string
string_data = codecs.decode(byte_data,'utf-8')
print(string_data)
As anticipated, this additionally outputs:
Abstract
On this tutorial, we realized the right way to convert bytes to strings in Python whereas additionally dealing with totally different encodings and potential errors gracefully. Particularly, we realized the right way to:
- Use the
decode()
methodology to transform bytes to a string, specifying the proper encoding. - Deal with potential decoding errors utilizing the
errors
parameter with choices likeignore
orchange
. - Use the
str()
constructor to transform a legitimate bytes object to a string. - Use the
decode()
perform from thecodecs
module that’s constructed into the Python commonplace library to transform a legitimate bytes object to a string.
Glad coding!
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embrace DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and low! At present, she’s engaged on studying and sharing her information with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates participating useful resource overviews and coding tutorials.