Binary files are a type of files that are used to store data in the form of contiguous bytes, in which the method of reading is not defined. This means that the program trying to read a binary file needs to be told how to read it. If you try to open a binary file using a normal text editor, you will notice unknown or unreadable characters popping up on your screen. This is because your editor assumes the data in text files to be encoded as text. Since the file is not encoded as text, it can not be read by the text editor.
This article takes a look at what binary files are, how they are different from traditional text files, and where to use them. Let’s understand how files work before contrasting between the two.
What is a Binary File?
A binary file is one that does not contain text. It is used to store data in the form of bytes, which are typically interpreted as something other than textual characters. These files usually contain instructions in their headers to determine how to read the data stored in them. These can be used to store any type of data in a computer.
Even though all files store data similarly, your operating system does not treat them all in the same way. This means that even though a sound file and an image are stored in the file system as continuous strings of data, an image can not be played in a music player, and neither can a music file be opened in a photo viewing software. The format of the file moderates this behavior. Extensions, such as “.mp3” and “.jpg”, attached to the files are used to determine the kind of data they’re expected to contain.
Broadly speaking, all files can be classified into two major formats — text and binary. Binary files encompass all non-text files, while text files are highly restrictive, and can only store textual data.
Binary files can store any kind of data, as long as the header of the file defines the file type and other relevant information like the block and body sizes accurately. Let us understand the differences and similarities between the two now.
Binary File vs Text File
While binary and text files store data in the form of a sequence of bits, they are very different from each other. Let’s take a moment to understand the two formats independently.
Binary Files
All files that are not used to store textual data fall into this category. Any custom file type can be created using a binary file, as long as the necessary information on how to read the file is stored in the file. These files store multiple types of data like image, video, and audio in the same file. The only requirement that they present is to have a suitable program for reading such kind of data present in the system.
The PNG format is a great example of the above use-case. A PNG file can be read by most image viewers and shows graphical information. If you open a PNG file with a text editor, most of the file will be composed of unrecognizable characters. But you will also find pieces of readable text scattered all over the file. This is because the PNG file includes small sections for storing textual data along with the graphical information. Some other file formats support this too, and this is possible due to the dynamic nature of binary files.
Binary files contain a header at the top. This header is the key to the file. It is used to store the information that identifies the file’s content. Usually, headers contain the file type and other metadata like size and date last modified. If a binary file’s header is damaged, it is equivalent to the key being lost, which means you can not access meaningful data from the file anymore.
Text Files
Text files can be seen as a narrowed-down version of binary files. They can store textual data only. All text files follow the ASCII standard, at the minimum, to store data. Text files can be viewed by any text editor. This ease in viewing the data makes them less prone to unintentional data damage, as any damages are easily identified by text editors.
Text files support multiple formats to store formatted and plain textual data. A TXT file stores unformatted, raw text with line and file endings only, while a more complex RTF (Rich Text Format) can store formatted textual content, with styles like bold and italics. Apart from the bare minimum ASCII encoding standard, modern text files support even vast standards like the UTF (Unicode Text Format). Such standards allow you to store a wider range of characters in your text files and read them easily.
There are even more advanced file formats like DOC and DOCX, which couple text and binary files to provide users with a better experience. DOCX, for instance, is a standard used by Microsoft Word to store text files along with metadata that can help view the textual content better inside the Word application. If you want to check it out for yourself, try renaming a document.docx file to document.docx.zip, and then open it using any unzipping tool. You will find a text file along with several XML files that store the document’s metadata.
Advantages of Using Binary Files
Binary files provide multiple benefits compared to plain text files. Let’s take a look at some of them:
Efficiency via Compression
Data is stored in binary files according to custom rules for use-case specific optimizations. PNG is a great example of this because it can be used to create small and efficient image files.
Better Security
Once again, the customization that binary files offer allows businesses to create custom encoding standards, which can be difficult to reverse engineer. More often than not, the only way to read a custom-encoded binary file is to guess how data has been stored in it.
Unmatched Speed
As the data is stored in a raw format, and is not encoded using any character encoding standards, it is faster to read and store. This is the primary reason why data stores for applications are not built using plain text files.
Issues with Binary Files
While binary files offer many benefits over plain text files, they also present several issues. Following are the most common problems faced when using binary files:
Difficult to manipulate
Binary files can not be read by conventional text processors, so editing them is a difficult task. More often than not, applications choose to save their data using custom encoding schemes. This data can then be manipulated only inside of the applications with the same encoding scheme.
Efficiency gain is not uniform
While storing data in binary format might be fast and efficient in formats like PNG, other data types may not receive any noticeable performance benefits. Storing textual data adds another hassle of encoding and decoding data while viewing.
Can get confusing for machines
Computers can have different ways of storing and accessing data. When binary data is transmitted between two computers with different architectures, issues like NUXI can arise. If a computer saves “UNIX” in a binary file, and the file is opened on another computer with a different architecture, it might be read as “NUXI”. Textual data stored in TXT or similar formats are immune to such issues, due to the presence of standards like ASCII.
Where are Binary Files Used?
Having seen the various aspects of binary files, it is now important to understand where to use them. Here are some of the top use-cases of binary files:
- Software development. Most compilers like the JVM create optimized bytecode from source code for faster execution. While the source code is stored in the form of text files, it does not make much sense in storing the bytecode similarly. This is so because source code has to be viewed by a developer in the process of creating the software, while the bytecode has to be read by the machine. It will always be faster for machines to process bytes rather than encoded characters.
- Image handling. As mentioned earlier, most image formats like PNG are stored as binary files to optimize performance, and also allow the storage of small image files. If the data associated with a movie file were in the form of encoded characters, a standard DVD would have never been able to store a complete movie.
- Game development. Games require a great amount of integer and character-based calculations. It is convenient to store these numbers in the form of bytes and operate on them, as encoding and decoding a 32-bit integer adds a considerable amount of time to the process.
- Storing large datasets. Datasets for tasks like machine learning model training are often required to be stored and made available to computers. While storing a dataset in a text file makes sense, as it can be viewed easily, it can pose performance issues for the machine.
Conclusion
Text files are the most popular standard for storing data among standard computer users as they are easy to read and can support formatting as well.
But for people who are involved in software development, text files are usually a no-go in storing and processing data inside programs. Configuration data related to software projects are usually stored in binary files. Binary files offer unmatched speed and efficiency when carrying out operations on the stored data. If you’re looking to create your application that involves files, the binary file format is the way to go for you!
About us: Career Karma is a platform designed to help job seekers find, research, and connect with job training programs to advance their careers. Learn about the CK publication.