Skip to main content

Remove Duplicate Rows in Excel in C# and VB.NET

When an Excel file contains a huge amount of records, there might be duplicate records as well. In this blog, I am going to show you how to remove the duplicate rows in an Excel file programmatically in C# and VB.NET.

The library I used:

Free Spire.XLS for .NET

Free Spire.XLS for .NET is a feature-rich Excel API offered by E-iceblue. It can be easily integrated in your .NET (C#, VB.NET, ASP.NET, .NET Core) applications to create, read, edit, convert and print Excel files without using Microsoft Office.

Before coding, you need to get Free Spire.XLS for .NET by installing it via NuGet or downloading it via the official website.

C# Code

  1. using Spire.Xls;  
  2. using System.Linq;  
  3.   
  4. namespace RemoveDuplicateRows  
  5. {  
  6.     class Program  
  7.     {  
  8.         static void Main(string[] args)  
  9.         {  
  10.             //Create a Workbook instance  
  11.             Workbook workbook = new Workbook();  
  12.             //Load the Excel file  
  13.             workbook.LoadFromFile("TEST.xlsx");  
  14.   
  15.             //Get the first worksheet  
  16.             Worksheet sheet = workbook.Worksheets[0];  
  17.   
  18.             //Specify the range that you want to remove duplicate records from.  
  19.             var range = sheet.Range["A1:A" + sheet.LastRow];  
  20.             //Get the duplicated row numbers    
  21.             var duplicatedRows = range.Rows  
  22.                    .GroupBy(x => x.Columns[0].DisplayedText)  
  23.                    .Where(x => x.Count() > 1)  
  24.                    .SelectMany(x => x.Skip(1))  
  25.                    .Select(x => x.Columns[0].Row)  
  26.                    .ToList();  
  27.   
  28.             //Remove the duplicate rows & blank rows if any             
  29.             for (int i = 0; i < duplicatedRows.Count; i++)  
  30.             {  
  31.                 sheet.DeleteRow(duplicatedRows[i] - i);  
  32.             }  
  33.   
  34.             //Save the result file  
  35.             workbook.SaveToFile("output.xlsx", ExcelVersion.Version2013);  
  36.         }  
  37.     }  
  38. }  

VB.NET Code

  1. Imports Spire.Xls  
  2. Imports System.Linq  
  3.   
  4. Namespace RemoveDuplicateRows  
  5.     Class Program  
  6.         Private Shared Sub Main(ByVal args As String())  
  7.             Dim workbook As Workbook = New Workbook()  
  8.             workbook.LoadFromFile("TEST.xlsx")  
  9.             Dim sheet As Worksheet = workbook.Worksheets(0)  
  10.             Dim range = sheet.Range("A1:A" & sheet.LastRow)  
  11.             Dim duplicatedRows = range.Rows.GroupBy(Function(x) x.Columns(0).DisplayedText).Where(Function(x) x.Count() > 1).SelectMany(Function(x) x.Skip(1)).[Select](Function(x) x.Columns(0).Row).ToList()  
  12.   
  13.             For i As Integer = 0 To duplicatedRows.Count - 1  
  14.                 sheet.DeleteRow(duplicatedRows(i) - i)  
  15.             Next  
  16.   
  17.             workbook.SaveToFile("output.xlsx", ExcelVersion.Version2013)  
  18.         End Sub  
  19.     End Class  
  20. End Namespace 

The input Excel file:


 The output Excel file:


 


Comments

Popular posts from this blog

3 Ways to Generate Word Documents from Templates in Java

A template is a document with pre-applied formatting like styles, tabs, line spacing and so on. You can quickly generate a batch of documents with the same structure based on the template. In this article, I am going to show you the different ways to generate Word documents from templates programmatically in Java using Free Spire.Doc for Java library. Prerequisite First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file. <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>                   <url>http: //repo.e-iceblue.com/nexus/content/groups/public/</url>                </repository>       </repositories>       <dependencies>           <dependency>               <g

Simple Java Code to Convert Excel to PDF in Java

This article demonstrates a simple solution to convert an Excel file to PDF in Java by using free Excel API – Free Spire.XLS for Java . The following examples illustrate two possibilities to convert Excel to PDF:      Convert the whole Excel file to PDF     Convert a particular Excel Worksheet to PDF Before start with coding, you need to Download Free Spire.XLS for Java package , unzip it and import Spire.Xls.jar file from the lib folder in your project as a denpendency. 1. Convert the whole Excel file to PDF Spire.XLS for Java provides saveToFile method in Workbook class that enables us to easily save a whole Excel file to PDF. import com.spire.xls.FileFormat; import com.spire.xls.Workbook; public class ExcelToPDF {     public static void main(String[] args){         //Create a Workbook         Workbook workbook = new Workbook();         workbook.loadFromFile( "Sample.xlsx" );         //Fit to page         workbook.getConverterSetting().setShee

Insert and Extract OLE objects in Word in Java

You can use OLE (Object Linking and Embedding) to include content from other programs, such as another Word document, an Excel or PowerPoint document to an existing Word document. This article demonstrates how to insert and extract embedded OLE objects in a Word document in Java by using Free Spire.Doc for Java API.   Add dependencies First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file.     <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>                   <url>http://repo.e-iceblue.com/nexus/content/groups/public/</url>               </repository>       </repositories>       <dependencies>           <dependency>               <groupId>