Data Structures – Choosing the Right One
Imagine if you walk into a library looking for a specific book, but the books in the library aren’t organised in any way. They aren’t in alphabetical order or shelved by genre. All you can see are piles of books that reach to the ceiling.
When you ask the librarian to get the book for you, she tells you to leave your phone number, so she can call you when she finds the book. There are more than 10,000 books in the library, and she has to search through each pile until she finds the book you want. It could take an hour, a day, a month, or even a year.
Fortunately, libraries do organize their books, and so finding a book usually doesn’t take very long.
Organizing a large number of items is critical when you want to retrieve an item in an efficient manner. In the world of software development, using a data structure is one way to organize data. Choosing the right data structure for your application is critical. It can be the difference between an application that flies, and one that crawls.
What is a Data Structure?
Wikipedia defines a data structure as “a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.”
There are three characteristics in that definition:
- A collection of data values
- The relationships between them
- The functions or operations that can be applied to the data.
A collection of data values
Data structures are a way to organize a collection of data values. For example, an array is a data structure. You can store more than one value in an array.
The relationships between them
An array is only one type of data structure. It differs from others in the way it organises the data (“the relationships between them”). When you use an array, the data values are stored sequentially in one contiguous block in memory.
Another type of data structure is a linked list. A linked list is also a sequential data structure, like the array, but the values aren’t stored as one contiguous block. In a singly-linked list, each node in the list contains the value, and a field that points to the next value in the list.
(from Wikipedia – public domain image)
This linked list contains 3 values: 12, 99, and 37. The value 12 is at the head of the list.
As you can see, the relationship between items in a linked list is different than the relationship between items in an array. They are distinct data structures.
The functions or operations that can be applied to the data
The third characteristic that describes a data structure is what operations you can perform on the values stored in the structure.
In an array, you can add, remove, and retrieve items using indices. For example, if you have a 10-element integer array that contains the values 10, 20, 30, 40, and 50, you can add 60 to the array by assigning it to index 5 (array indices are usually 0-based).
If you want to retrieve the value 40, you do so using array.
To remove an item from the array, you can set the value to 0 at the specified index, or you can shift the remaining elements down by one position.
You can do the same operations on a singly-linked list, but the way you add, remove, and retrieve items is different. Let’s look at the diagram again.
Unlike an array, you don’t have instant access to every item in the list. All you have is the reference to the first node in the list (the head reference). Because of that, when working with a singly-linked list, you always add items to the head of the list, and remove items from the head of the list.
Yes, you could add an item between values 99 and 37, but to do so, you’d have to start at the head of the list and follow the next pointers until you found 99. In a 3-element list, it wouldn’t take long, but if your list contained thousands of items, performance would be an issue. So one of the characteristics of a linked list is that they don’t allow insertions and deletions anywhere. They only allow them at the head of the list.
For example, in Java, a singly-linked list class wouldn’t contain a method that allows you to insert an item anywhere in the list. (note that the LinkedList class in the JDK is a doubly-linked list, and so its behaviour is different.)
Why would you use a singly-linked list rather than an array? If you want stack (LIFO) behaviour, where the last item added is the first item removed, singly-linked lists are a perfect fit. That’s one example of when a linked list would be a better choice than an array.
On the other hand, if you want to be able to quickly access any item in the data structure, an array would be the better choice.
Data Structures in Java
Java’s JDK contains classes for most of the common data structures, so you don’t have to write your own. For example, the LinkedList class is an implementation of a doubly-linked list. HashMap is an implementation of the hashtable data structure.
It’s wonderful that the JDK contains classes for most of the data structures you’d consider using in your applications, but you shouldn’t use a data structure without having some understanding of how it’s organized, and how it works.
Advantages and Disadvantages
We’ve taken a very quick look at the array and singly-linked list data structures, and seen how each one could be the better choice to use in an application, depending on the behaviour we want. There are other data structures to choose from: trees, hashtables, graphs, and more.
It’s important for you to have some understanding of how each data structure works, meaning how it organises the data and what operations it allows. Armed with this knowledge, you’ll be able to choose the best data structure for your application. Doing so could be the difference between an application that performs well, and one that performs poorly because the data structure it’s using to organise data isn’t a good fit for the operations it frequently performs.
Data structures are a way to organise data. A data structure is defined by how it organises the data, and by the operations it allows us to perform on the data. It’s important to choose the right data structures and algorithms for your next project. You can blindly use the classes that Java and other languages provide, but understanding how each one works and choosing the right one for an application is a critical skill in software development.
“For example, in Java, a singly-linked list class wouldn’t contain a method that allows you to insert an item anywhere in the list.” Doesn’t this directly contradict with the very definition of LinkedList that it is an ordered sequence that allows efficient insertion and removal at any location?
A single linked list class, by its very design, isn’t designed for that. As pointed out in the article, Java’s implementation of LinkedList is a doubly-linked list and does allow you to efficiently insert and remove at any location.