Remove Duplicates From Python List | 12 Ways With Code Examples
In Python, lists are one of the most versatile and commonly used data structures, allowing you to store multiple items in a single variable. However, when working with lists, you may often encounter duplicate values, which can clutter your data or lead to incorrect results in computations. Removing duplicates from a list is a common operation, whether you're cleaning data, optimizing performance, or ensuring uniqueness.
There are multiple ways to remove duplicates from Python lists, which we will discuss in this article, with code examples. We’ll also compare these methods to help you choose the best one for your needs. So let’s get started!
How To Remove Duplicates From A List In Python?
Here’s a quick overview of the methods to remove duplicates from Python lists we will cover:
- The set() method
- For loop
- List comprehension
- List comprehension with enumerate()
- Dictionary and fromkeys()
- The in and not in operators
- The collections.OrderedDict.fromkeys()
- Counter with frequency distribution
- The del keyword
- Pandas DataFrame
- Unique from panda and NumPy (pd.unique and np.unique)
- The reduce() function
The set() Function To Remove Duplicates From Python List
The built-in Python function set() is one of the most straightforward and commonly used methods to remove duplicates from a Python list. The set data structure automatically ensures that all elements are unique, making it a quick and efficient solution.
How It Works?
- Convert list to set: We first convert the Python list into a set, which is an unordered collection of unique elements.
- Duplicates automatically removed: Conversion from list to set, automatically discards any duplicate entries, since sets do not support duplicate elements.
- Convert set to a list: Use the list() function to convert the set back to a list. Note that since sets are unordered, this step ensures the final result is in a list format, but the order will not be preserved.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the simple Python program example,
- We begin by creating a list called original_list, that contains duplicate elements: [1, 2, 2, 3, 4, 4, 5]. The goal is to remove the duplicates and create a new list with only unique values.
- Then, we use the set() function, passing original_list as an argument, to convert the list to a set.
- The function creates a set containing {1, 2, 3, 4, 5}—the duplicate values 2 and 4 are eliminated in this process.
- Next, we use the list() function to convert the set back to a list and assign the outcome to the unique_list. Since a set is not a list and doesn't maintain order, we convert it back using the function with ease.
- Now, unique_list contains only the unique elements from the original list, but the order of elements is not preserved due to the nature of sets.
- Finally, we use the print() function to display both the original list and the new unique list for comparison.
This demonstrates how easily duplicates can be removed using the set() function. However, note that this method is not suitable if you need to preserve the original order of elements in the list.
Remove Duplicates From Python List Using For Loop
The for loop is a manual approach to remove duplicates from a Python list. It involves iterating through the original list and appending elements to a new list only if they are not already present. This method is ideal when you need to preserve the original order of the elements and avoid using additional libraries.
How It Works?
- Create an empty list: Start by initializing an empty list where only unique elements will be stored.
- Iterate through the original list: Use a for loop to go through each element in the list one by one.
- Check if the element is already present: For each element in the original list, check if it is in the new list.
- If not present, append it to the new list.
- If already present, skip it.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the simple Python code example,
- We start with the list original_list, which contains duplicate values.
- Then, we create an empty list unique_list to store unique elements.
- Next, we use a for loop to iterate over each item in original_list.
- Inside, we have an if-statement which checks whether the current item is already present in unique_list.
- If it isn’t, we use the append() method to add it to the unique_list.
- If it is there, the flow skips the if-block, and we move to the next iteration.
- After the iterations, the unique_list will contain only the unique elements [1, 2, 3, 4, 5] while maintaining the original order. We print both lists to the console.
This method preserves order but can be slow for large lists due to repeated membership checks.
Check out this amazing course to become the best version of the Python programmer you can be.
Using List Comprehension Remove Duplicates From Python List
List comprehension offers a compact way to remove duplicates from a list while preserving the order of elements. By combining list comprehension with a condition that checks for duplicates, we can filter out repeated values efficiently.
How It Works?
- Create an empty list: In the list comprehension, an empty list will be created automatically to store unique elements.
- Iterate through the original list: The list comprehension iterates through each item in the original list.
- Check for uniqueness: For each element, check if it is already in the result list.
- If not present, add it to the result.
- If it is already present, skip it.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the Python program example:
- We begin with original_list, which contains duplicates, and create an empty list unique_list, to store the new list with unique items.
- Then, we use list comprehension to iterate through each element in original_list.
- The condition if item not in unique_list ensures that each element is checked for uniqueness.
- If it is not already in the unique_list, it is appended to the list.
- If it is there in unique_list, we ignore the element and move to the next.
- The result is that unique_list contains only the unique elements [1, 2, 3, 4, 5], with the original order preserved.
This approach in Python programming is concise and readable, but performance can suffer for large lists due to repeated membership checks.
Remove Duplicates From Python List Using enumerate() With List Comprehension
By combining list comprehension with the enumerate() function, we can remove duplicates from a list while keeping track of the index positions of elements. This approach is especially useful when we want to filter duplicates based on the order of their first occurrence while leveraging the power of enumerate() to access both the element and its index in the original list.
How It Works?
- Create an empty list: An empty list is used to store the unique elements as they are identified.
- Use enumerate(): The enumerate() function in Python provides both the index and the value of each element while iterating through the list.
- Check for uniqueness: The list comprehension checks whether an element has appeared before by using its index.
- If it’s the first time the element appears, it is added to the result list.
- If the element is a duplicate, it is skipped.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In this Python code example:
- We begin with the list with duplicate values called original_list, and create an empty list called unique_list.
- Then, we use list comprehension to iterate through each element in original_list using enumerate() to get both the index and the item.
- original_list[:index] checks if the item already appeared earlier in the list by slicing the list up to the current index.
- If the item has not been encountered before, it is added to unique_list. If it has, it is skipped.
- The result is that unique_list contains only the unique elements [1, 2, 3, 4, 5], and the original order is preserved.
This method removes duplicates based on the first occurrence but can be slower for large lists due to repeated slicing.
Dictionary & fromkeys() Method To Remove Duplicates From Python List
The fromkeys() method of a dictionary provides a clever way to remove duplicates from a list. Since dictionary keys are inherently unique, this method takes advantage of that property. By converting the list into a dictionary where the list elements serve as keys, duplicates are automatically removed. Afterward, you can convert the dictionary back to a list to get the unique elements.
How It Works?
- Convert the list to a dictionary: Use fromkeys() to create a dictionary where each element of the list becomes a key. Since dictionary keys must be unique, duplicates are eliminated in the process.
- Convert the dictionary back to a list: Once duplicates are removed, convert the dictionary back into a list to get the final list of unique elements.
- Preserve the order: The order of the original elements is not preserved because dictionaries before Python 3.7 were unordered. However, since Python 3.7+, dictionaries maintain insertion order, making this method useful if order matters.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the example Python program:
- We first initialize a list original_list with duplicate values.
- Then, we use the fromkey() method to create a dictionary where each element of original_list becomes a key. Since dictionary keys are unique, the duplicates are automatically removed.
- We then convert the dictionary back to a list using list(), which results in the list [1, 2, 3, 4, 5], containing only the unique elements.
- The order is preserved (since Python 3.7+), and the output shows the desired result.
While this method is concise and efficient for removing duplicates, it may not be ideal for older Python versions (prior to 3.7), where the order of elements in a dictionary is not guaranteed.
Remove Duplicates From Python List Using in, not in Operators
This method uses a simple and intuitive approach by leveraging the in and not in operators within a for loop. By iterating through the original list and checking if each element is already present in a new list, duplicates are removed efficiently while maintaining the order.
How It Works?
- Create an empty list: We initialize an empty list to store the unique elements.
- Iterate through the original list: A for loop is used to go through each element of the original list.
- Check if the element is not already in the new list: For each element, check if it is already in the new list using the not in operator.
- If the element is not present, append it to the new list.
- If it is already present, skip it.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the example Python code:
- We initialize a list original_list with some duplicate values and create an empty list, unique_list, to store only unique items.
- Then, we use a for loop with in and not operators to iterate through each item in the original_list.
- The condition if item not in unique_list checks if the current item is already in the unique_list.
- If the item is not present, it is appended to unique_list. If it is already present, the loop moves to the next element.
- After the loop finishes, unique_list contains only unique elements [1, 2, 3, 4, 5], preserving the original order.
This method works well for small lists, but it can become inefficient for larger lists due to the repeated in checks, which have a time complexity of O(n).
Remove Duplicates From Python List Using collections.OrderedDict.fromkeys()
The OrderedDict from the collections module is another excellent way to remove duplicates from a Python list while preserving the order of the elements. The OrderedDict is a dictionary subclass that remembers the order in which items are inserted. By using its fromkeys() method, we can take advantage of its unique key properties to eliminate duplicates from the list.
How It Works?
- Create an OrderedDict: Use the OrderedDict.fromkeys() method to create an ordered dictionary, where the elements of the list become the keys. Since dictionary keys must be unique, duplicates are automatically removed.
- Convert the OrderedDict back to a list: After duplicates are removed, the keys of the OrderedDict are converted back into a list.
This step results in a list that contains only the unique elements, preserving their original order.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the sample Python code:
- We begin with the original_list, which contains duplicate elements.
- Then, we call the OrderedDict.fromkeys() method on the list creates an ordered dictionary, where each element in original_list becomes a key. Since dictionary keys must be unique, duplicates are automatically removed.
- We then convert the keys of the OrderedDict back to a list using list(), which results in [1, 2, 3, 4, 5], the unique elements from the original list.
- This method preserves the original order, as OrderedDict maintains the insertion order of its elements.
The OrderedDict approach is efficient for preserving order but slightly slower than set() for large lists; ideal when order matters.
Level up your coding skills with the 100-Day Coding Sprint at Unstop and get the bragging rights, now!
Remove Duplicates From Python List Using Counter with freq.dist() Method
The Counter class from the collections module is typically used to count the occurrences of items in an iterable. However, it can also be used to remove duplicates by leveraging its frequency distribution properties. By converting the list into a Counter object, we can easily remove duplicates while still keeping track of the frequency of each element, although, in the context of removing duplicates, we only care about the keys (unique elements).
How It Works?
- Use Counter: Convert the original list into a Counter object. Each unique element in the list becomes a key in the Counter, with its value being the count of occurrences.
- Extract keys: Since duplicates are eliminated at this stage, you can extract the keys of the Counter, which are the unique elements.
- Convert back to list: Convert the keys of the Counter object back to a list.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the basic Python program example:
- We have a list original_list with duplicate elements.
- Then, we use the Counter() method to create a Counter object from the list, where each unique element in original_list becomes a key in the Counter. The value for each key represents the frequency of that element in the list.
- We then use the keys() method to extract the unique elements (keys) from the Counter and convert them into a list using list(). This results in [1, 2, 3, 4, 5].
- The output shows the unique elements, with duplicates removed, and the order of elements is maintained.
This approach is both concise and effective for removing duplicates from the Python list while tracking frequencies; however, order is not guaranteed in Python versions < 3.7.
The del Keyword Remove Duplicates From Python List
The del keyword in Python is typically used to delete variables or elements from data structures like lists. While it may not be the most commonly used approach for removing duplicates, it can still be leveraged effectively in combination with a loop to eliminate duplicate elements from a list. This method works by iterating through the list, and if a duplicate is found, the del keyword removes the element from the list.
How It Works?
- Iterate through the list: We begin by iterating through the list from the first to the last element.
- Check for duplicates: For each element, we check if it already exists in the sublist before it (this can be done using slicing or index-based checking).
- Delete duplicates: If a duplicate is found, we use the del keyword to remove the element from the list.
Code Example:
Output:
Original List: [1, 2, 3, 4, 5]
Explanation:
In the Python code sample,
- We start with the original_list, which contains duplicate values.
- Then, we initialize an index variable to keep track of iterations through the list.
- Next, we use a while loop with an if-else statement. It checks if the current element (at original_list[index]) exists in the sublist before it (original_list[:index]).
- If it is found, we use the del keyword to remove it from the list.
- If it is not found, we move to the next element by incrementing the index.
- After the loop completes, the list contains only the unique elements [1, 2, 3, 4, 5], with duplicates removed.
This method works well, but it can be less efficient than others because each time an element is checked using slicing (original_list[:index]), a new list must be created.
Remove Duplicates From Python List Using DataFrame
When working with data, particularly in Python, the pandas library is often used for handling tabular data. In cases where lists need to be manipulated, a list can be converted to a DataFrame, and duplicates can be removed using pandas functionality. This method takes advantage of pandas' efficient handling of data and is useful when working with larger datasets or when other data processing tasks are required in conjunction with removing duplicates.
How It Works?
- Convert the list to a DataFrame: First, we convert the original list into a pandas DataFrame. This step is helpful when dealing with more complex data structures or when additional data manipulation is necessary.
- Remove duplicates: Use the drop_duplicates() method available in pandas to remove duplicate rows from the DataFrame.
- Convert back to a list: After removing duplicates, convert the DataFrame back into a Python list which contains only the unique elements.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the example:
- We start with the original_list that contains duplicate values.
- Then, we use the DataFrame() method with original_list and original_list, column number as argument, to convert the list into a DataFrame. This step is necessary to leverage pandas functionalities.
- Next, we use the drop_duplicates() method to remove duplicates from the DataFrame. This method automatically keeps the first occurrence of each unique element.
- Finally, we use the tolist() function to convert the column of the DataFrame back into a Python list, resulting in the list [1, 2, 3, 4, 5].
This method is best for large datasets or complex workflows but requires importing pandas and may be overkill for small lists.
Remove Duplicates From Python List Using pd.unique and np.unipue
Both pandas and numpy offer efficient methods for handling unique elements in an iterable. The pd.unique() function in pandas and np.unique() in numpy are widely used to find the unique values in a list while removing duplicates. These methods are particularly useful when you’re already working with pandas DataFrames or numpy arrays, or when you need high-performance solutions for larger datasets.
How It Works?
- Use pd.unique() or np.unique(): Both methods take an iterable (like a list) and return the unique elements, effectively removing duplicates.
- pd.unique() is part of the pandas library and is typically used for handling pandas Series or DataFrame columns.
- np.unique() is part of the numpy library and is used for finding unique values in a numpy array.
- Convert to list: Both methods return the unique elements in the order they appear in an array or Series. You must use tolist() to convert it back to a list.
Code Example:
Output:
Unique List using pd.unique(): [1, 2, 3, 4, 5]
Unique List using np.unique(): [1, 2, 3, 4, 5]
Explanation:
In the sample Python program:
- We use the unique() function from the pandas library on original_list, which returns the unique elements in the list. This function preserves the order of elements.
- Similarly, the unique() method from the numpy library also returns the unique elements in the list, preserving the order.
- Both methods return a numpy array by default, so we use the tolist() to convert the array back into a Python list.
- The final output is the list [1, 2, 3, 4, 5], where duplicates have been removed.
Both pd.unique() and np.unique() are efficient for removing duplicates in pandas or numpy contexts, but require importing the respective Python libraries.
Remove Duplicates From Python List Using reduce() function
The reduce() function from the functools module is another interesting approach for removing duplicates from a Python list. Although it's typically used for performing cumulative operations (like summing a list or multiplying its elements), it can also be leveraged to accumulate only unique elements from a list.
How It Works?
- Use reduce(): The reduce() function applies a binary function cumulatively to the items of the iterable, from left to right. In this case, we can use it to check if an element has already been added to the result list. If it hasn't, we append it.
- Accumulator: The accumulator in the reduce() function holds the intermediate results, which, in this case, will be the list of unique elements.
Code Example:
Output:
Original List: [1, 2, 2, 3, 4, 4, 5]
Unique List: [1, 2, 3, 4, 5]
Explanation:
In the example code, we have a list original_list with duplicate elements.
- We use the reduce() function with a lambda function as the accumulator.
- The lambda function checks if the current element x is not already in the accumulator (acc).
- If it's not present, it adds x to the list; otherwise, it keeps the list unchanged.
- The accumulator starts as an empty list ([]), and as reduce() processes the original_list, it accumulates only the unique elements.
- The final result, unique_list, contains the unique elements [1, 2, 3, 4, 5].
The reduce() approach is elegant but less efficient for large datasets due to repeated checking; useful in complex scenarios or when already using reduce().
Comparative Analysis Of Ways To Remove Duplicates From Python List
Now that we've covered several methods to remove duplicates from a Python list, it's time to compare them in terms of their efficiency and use cases. Below is a table summarizing the complexity and characteristics of each method.
Method |
Time Complexity |
Space Complexity |
Remarks |
set() |
O(n) |
O(n) |
Simple and fast; order of elements is not preserved. |
List comprehension |
O(n) |
O(n) |
Keeps the order intact; more Pythonic than loops. |
For loop |
O(n^2) |
O(n) |
Not efficient for large lists due to repeated membership checks. |
List comprehension + enumerate() |
O(n) |
O(n) |
A slight optimization over basic list comprehension. |
dict.fromkeys() |
O(n) |
O(n) |
Order is not guaranteed in older Python versions (< 3.7); efficient in most cases. |
in / not in operators |
O(n^2) |
O(n) |
Less efficient due to repeated membership checks. |
OrderedDict.fromkeys() |
O(n) |
O(n) |
Maintains insertion order; works well for large lists. |
Counter() |
O(n) |
O(n) |
Provides frequency count; best for large datasets with additional analysis. |
del keyword |
O(n^2) |
O(1) |
Not ideal for this task due to inefficiency. |
DataFrame (pandas) |
O(n) |
O(n) |
Efficient with large datasets, but requires pandas overhead. |
pd.unique() / np.unique() |
O(n) |
O(n) |
Ideal when working with pandas or numpy arrays. |
reduce() |
O(n^2) |
O(n) |
Elegant but inefficient for large lists. |
As is evident from the table:
- Efficient Methods: set() and list comprehension are both fast (O(n)) and efficient, but set() doesn’t preserve order, while list comprehension does.
- Order Preservation: Use list comprehension, OrderedDict.fromkeys(), or pd.unique() if maintaining the original order is essential.
- Best for Large Datasets: Counter() and DataFrame methods are optimized for large datasets, but DataFrame introduces overhead.
- Inefficient Methods: del, in/not in, and reduce() are slower (O(n²)) and should be avoided for larger lists.
- Special Cases: Counter() is useful for frequency analysis, and np.unique()/pd.unique() work well in numpy or pandas contexts.
Looking for guidance? Find the perfect mentor from select experienced coding & software development experts here.
Conclusion
Removing duplicates from a Python list is a common task that can be approached in multiple ways, each with its own advantages and trade-offs. We've discussed 12 distinct methods, ranging from simple and intuitive solutions like the set() function and list comprehension to more complex methods involving Counter(), pandas, and reduce().
Each method has its own performance characteristics, making it suitable for different scenarios based on the size of the data and the need to preserve element order. The best method for removing duplicates depends on your specific needs—whether it’s efficiency, maintaining order, or handling large datasets.
Frequently Asked Questions
Q1. How do I remove duplicates from two lists in Python?
You can remove duplicates from two lists by concatenating them and converting the result to a set(), then converting it back to a list. For example:
list1 = [1, 2, 3]
list2 = [2, 3, 4]
unique_list = list(set(list1 + list2))
This removes duplicates from both lists.
Q2. How do I remove duplicates from a nested list in Python?
To remove duplicates from a nested list (where each element is a list), you can use a set of tuple conversions. For example:
nested_list = [[1, 2], [2, 3], [1, 2]]
unique_list = [list(item) for item in set(tuple(i) for i in nested_list)]
Q3. How do I remove duplicates from a list of lists?
Removing duplicates from a list of lists works similarly to the nested list example. Use the set method with tuple conversions. For example:
list_of_lists = [[1, 2], [2, 3], [1, 2]]
unique_list = [list(item) for item in set(tuple(i) for i in list_of_lists)]
Q4. How do I remove duplicates from two lists?
You can concatenate two lists and use set() to remove duplicates. For example:
list1 = [1, 2, 3]
list2 = [3, 4, 5]
unique_list = list(set(list1 + list2))
Q5. How do I remove duplicates from a nested array?
Similar to nested lists, you can remove duplicates by converting the elements of lists to tuples, using set(), and then converting them back to a list. For example:
nested_array = [[1, 2], [2, 3], [1, 2]]
unique_array = [list(item) for item in set(tuple(i) for i in nested_array)]
Q6. Does sorted() remove duplicates from Python lists?
No, the sorted() function does not remove duplicates. It only sorts the elements in ascending order. If you want to sort and remove duplicates from Python lists, use sorted(set(list)). For example:
my_list = [3, 1, 2, 3, 2]
unique_sorted_list = sorted(set(my_list))
Do check the following out:
- Python Reverse List | 10 Ways & Complexity Analysis (+Examples)
- Python Assert Keyword | Types, Uses, Best Practices (+Code Examples)
- Python Strings | Create, Format, Reassign & More (+Examples)
- Python input() Function (+Input Casting & Handling With Examples)
- Convert Int To String In Python | Learn 6 Methods With Examples
- Python max() Function With Objects & Iterables (+Code Examples)