Dynamic Excel file loading with SSIS (2025)

  • Download source - 108.2 KB
  • Download demo - 149.3 KB

Introduction

Loading Excel files with same schema but different file nameor sheet name is a challenge for many SSIS developers. Microsoft does not haveany Dynamic Excel loading component that search Excel files using a pattern andindentify the sheet contains data and load it.

The article helps you to challenge such scenarios. Thepackage described here is able to search Excel files using a pattern and loadit without any user intervention. TheSheet contains data does not have start at first row or first column.

Features

  • Filename can be anything(ie it should match thepattern used for searching)
  • Format of excel can be any format (.xls – Office2003, .xlsx – Office 2007 or later)
  • Data to be loaded can be any sheet. Zerodependency on sheet name.
  • Data can start at any different column or row,ie it does not have to be at A1 or A2 or B10 etc.
  • Report out if any additional fields are added tothe excel data after the design.
  • Archive processed files to Processed folder.

Building the package

  1. Create Folder structure(please feel free to create anywhere, for simplicity I use C:\ drive).

a.Create folderSSISLoadinC:\

b.Create subfolder inDataonC:\SSISLoad

c.Create subfolder inCountryonC:\SSISLoad\Data

d.Create subfolder inTestonC:\SSISLoad\Data\Country

e.Create subfolder inProcessedonC:\SSISLoad\Data\Country<o:p>

2.Package Design Preparations

a.Identify the Sheet that contains data. Rename it toData

b.Delete empty rows and columns such that data starts at cellA1

c.Save it asCountry Details - XX.xlsx& Copy toC:\SSISLoad\Data\Country\Test\

d.Create a new SSIS PackageLoad Dynamic Excel.dtsx

e.Create and Set package variable (string type)Input_Share_RootasC:\SSISLoad\Data\

f.Create and Set package variable (string type)DataFile_Input_PathasCountry\

g.Create and Set package variable (string type)DataFile_Search_PatternasCountry;.xlsx

h.Create package variable (string type)NewFieldsSummary

i.Create package variable (string type)NewFields

j.Create package variable (boolean type)FileFound

k.Create package variable (string type)SourceFile

l.Create package variable (string type)Extraction_Statement

m.Create package variable (string type)Connection_String

Save the package.

3. Setpackage variableConnection_String

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\SSISLoad\Data\Country\Test\Country Details - XX.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=YES;IMEX=1";

4.Preparingpackage variableExtraction_Statement
  • Save the package.

a.Add aDataFlowComponent to package(Drag DataFlow from Tools), name it asLoad Excel

b.Add a Excel connection to package(Right Click Connection manager area >> New Connection >> Excel >> Browse and Point to C:\SSISLoad\Data\Country\Test\ Country Details - XX.xlsx Name the connection asExcel_Source

c.Right clickExcel_Sourceconnection >>Properties >> Expression >> Click Ellipse button

i.PickConnection Stringfor Property from drop down , click Ellipse on Expression

ii.Set expression as @[User::Connection_String] or Drag it from Variables on the expression builder window.

d.Go toLoad Excel data flow and addExcel Sourceby dragging from Tool Box, name it asCountry Details

e.EditCountry Details

f.Set OLEDB Connection managerasExcel_Source(from drop down)

g.SetData Access Modeas SQL Command (from drop down)

h.Click Build Query

i.Click Add Table symbol and Select Data$ and click Add, Click Close

j.Manually check each and every column, do not check *(All Columns)

k.Copy the SQL Generated from the query pane:

SELECTID, StateProvinceCode, CountryRegionCode, IsOnlyStateProvinceFlag, Name, TerritoryID

FROM[Data$]

l.Cancel the Build Query process. Cancel Excel Source Editor window.

m.Make the SQL in single line and set as value forExtraction_Statement

n.Save the package.

5.SetCountry Detailsexcel sourcecomponent

a.EditCountry Detailsand set Connection managerasExcel_Source(from drop down)

b.SetData Access Modeas SQL Command from variable(from drop down)

c.SetVariable nameasUser:: Extraction_Statement(from drop down)

d.Click Columns from Left pane.

e.See all columns are populated (you may uncheck any column you do not want). Click OK

6.Set Destination as you like and map columns.

a.Create OLEDB Destination Connection managerOLEDB_Destination_Conn(sample used local server, Test as DB

b. Create table CountryDetails and map columns

Write Automation Code

1. Search excel file using the pattern specified in variable DataFile_Search_Pattern.

This patternseparated by extension using semicolon format

2. Enumerate Excel sheets for the file obtained by Search
a.Use OLEDB driver to create connection to Excel and read schema.


3. Identify Sheet contains data and data start address.
a.Search each sheet for fields used in Extraction_Statement and determine sheetname and data start address.
b.Modify connection string as per the excel file.

Using the Code

How it works(Concept)

The package search for the file using the specification supplied, determines the sheet contains data and its start address(like A1 OR C10 etc), and finally modifies the connection string to the Excel file to point the file obtained and modifies SQL Query to read the excel file from the sheet identified.This level of dynamic loading is achieved with the help of user defined variables in the package, and thus the run-time setting of Connection Managers, SQL Command to read/pull data are updated with the help of variables.

Let's now analyze concept (code perspective) as below:

The following method searches file in the path supplied. It uses matching pattern as parametersearchstringanduse parameter extension as filter

C#

public string SearchFile(string path, string extension, string searchstring) { DirectoryInfo di = null; if (Directory.Exists(path)) { di = new DirectoryInfo(path); } else return "Directory Does not Exist"; string newestFile; IEnumerable<System.IO.FileInfo> fileList = di.GetFiles("*" + searchstring.ToLower() + "*"); //Create the query IEnumerable<System.IO.FileInfo> fileQuery = from file in fileList where (extension.ToLower().Contains(file.Extension.ToLower())) orderby file.LastWriteTime select file; try { var FileSearchedResult = (from file in fileQuery orderby file.LastWriteTime select new { file.FullName, file.Name, file.CreationTime }).Last(); newestFile = FileSearchedResult.FullName; FileSearchedResult = null; fileList = null; di = null; return newestFile; } catch { fileList = null; di = null; return null; } }

The following method GetExcleSheetNames enumerates all available sheets in the excelFilesupplied as parameter.It uses OLEDDB driver to connect the Excel file and reads the Schema using GetOleDbSchemaTable.

C#

private List<String> GetExcelSheetNames(string excelFile) { OleDbConnection objConn = null; System.Data.DataTable dt = null; try { // Connection String. Change the excel file to the file you // will search. String connString = this.GetConnectionString(excelFile); // Create connection object by using the preceding connection string. objConn = new OleDbConnection(connString); // Open connection with the database. objConn.Open(); // Get the data table containg the schema guid. dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null); if (dt == null) { return null; } List<String> excelSheets = new List<string>(); // Add the sheet name to the collection. foreach (DataRow row in dt.Rows) { string WorkSheet = row["TABLE_NAME"].ToString(); WorkSheet = WorkSheet.StartsWith("'") ? "[" + WorkSheet.Substring(1) : WorkSheet; WorkSheet = WorkSheet.StartsWith("[") ? WorkSheet : "[" + WorkSheet; WorkSheet = WorkSheet.EndsWith("'") ? WorkSheet.Substring(0, WorkSheet.Length - 1) + "]" : WorkSheet; WorkSheet = WorkSheet.Substring(0, WorkSheet.Length - 1).EndsWith("$") ? WorkSheet : WorkSheet.Substring(0, WorkSheet.Length - 1) + "$]"; if (!excelSheets.Exists(delegate(string k) { return (k.ToLower() == WorkSheet.ToLower() || k.ToLower() + "$" == WorkSheet.ToLower()); })) excelSheets.Add(WorkSheet); } return excelSheets; } catch (Exception ex) { return null; } finally { // Clean up. if (objConn != null) { objConn.Close(); objConn.Dispose(); } if (dt != null) { dt.Dispose(); } } }

The following method ScanWorkSheet builds DataSet for first 1000 rows fromWorkSheet(parameter)

C#

private SourceFileDetails ScanWorkSheet(string excelFile, string WorkSheet, string ExtractionStatement, string FindColumn, out bool Success) { Success = false; System.Data.DataSet excelDataSet = new DataSet(); string connectionString = this.GetConnectionString(excelFile); using (OleDbConnection objConn = new System.Data.OleDb.OleDbConnection(connectionString)) { try { objConn.Open(); OleDbDataAdapter cmd = new OleDbDataAdapter("select top 1000 * from " + WorkSheet, objConn); cmd.Fill(excelDataSet, WorkSheet); cmd.Dispose(); } catch { } finally { if (objConn != null) { objConn.Close(); objConn.Dispose(); } } } SourceFileDetails sd = ScanRows(excelDataSet, ExtractionStatement, FindColumn); if (sd != null) { Success = true; sd.FileNamePath = excelFile; sd.ConnectionString = this.GetConnectionString(excelFile, true); return sd; } return (SourceFileDetails)null; }

The following method ScanRows scans each row find our interested columns. This is the way it identifies sheet contains data, and data start address. They key step in this method is build the list of columns/fields we are interested and search for it. Here we assume those fields/columns occur within 1000 rows. 1000 rows are selected here for performance and nobody create an excel file contains data after leaving first 1000 rows as blank.

C#

 public SourceFileDetails ScanRows(DataSet excelDataSet, string ExtractionStatement, string FindColumn) { if (excelDataSet.Tables.Count < 1) return (SourceFileDetails)null; string ExtractFields = this.ReplaceString(this.ReplaceString(ExtractionStatement, "SELECT", ""), "From [Data$]", ""); List<string> FindStrings = ExtractFields.Split(',').Select(s => s.Trim().Replace("[", "").Replace("]", "").Replace("#", ".").Replace("(", "").Replace(")", "")).ToList(); foreach (DataTable dt in excelDataSet.Tables) { List<ItemValueHolder> FoundAddress = new List<ItemValueHolder>(); int iDuplicates = 0; foreach (DataRow dr in dt.Rows) { int iItemColumnIndex = 0; foreach (var fieldValue in dr.ItemArray) { object cellData = fieldValue; string sCellData = cellData.ToString().Replace("#", ".").Replace("[", "").Replace("]", "").Replace("(", "").Replace(")", ""); if (cellData != null) if (FindStrings.Exists(delegate(string k) { return k.ToLower() == sCellData.ToString().ToLower(); })) { if (!FoundAddress.Exists(delegate(ItemValueHolder t) { return t.Item.ToLower() == sCellData.ToLower(); })) FoundAddress.Add(new ItemValueHolder(sCellData, dt.Rows.IndexOf(dr), iItemColumnIndex)); else iDuplicates++; } iItemColumnIndex++; } int iTotalFields = FindStrings.Count - iDuplicates; if (100 * FoundAddress.Count / (float)iTotalFields >= 90.00) { var query = FoundAddress.GroupBy( item => item.ItemValue, (itemvalue, items) => new { Key = (int)itemvalue, Count = items.Count(), }); int dataStartAddress = (from p in query where p.Count == (query.Max(it => it.Count)) select p.Key).Max(); dataStartAddress += 1; string SheetName = "[" + dt.TableName + "$A" + dataStartAddress.ToString() + ":IV]"; string _selectCommand = this.ReplaceString(ExtractionStatement, "[Data$]", SheetName); string reportDate = ""; if (!string.IsNullOrEmpty(FindColumn)) try { int ColIndex = (from p in FoundAddress where p.Item.ToLower() == FindColumn.ToLower() select p.ColumnIndex).First(); DataRow drRowReport = dt.Rows[dataStartAddress]; reportDate = drRowReport[ColIndex].ToString(); } catch (Exception e) { //log  } string NewFields = GetNewFieldsAtSource(FindStrings, dr); return new SourceFileDetails("", "", "", _selectCommand, reportDate, NewFields); } } } return (SourceFileDetails)null; }

The following class is used to store Excel file details that we can use to modify connection strings dynamically. The class is nothing but a information bundle; no other important methods that we use inside this class.

C#

using System;using System.Collections.Generic;using System.Linq;using System.Text;namespace ST_b050e37a03e84ca7b15b853949d29aa6.csproj{ public class SourceFileDetails { private string _fileNamePath = ""; public string FileNamePath { get { return _fileNamePath; } set { _fileNamePath = value; } } private string _fileExtension = ""; public string FileExtension { get { return _fileExtension; } set { _fileExtension = value; } } private string _connectionString = ""; public string ConnectionString { get { return _connectionString; } set { _connectionString = value; } } private string _extractQuery = ""; public string ExtractQuery { get { return _extractQuery; } set { _extractQuery = value; } } private string _reportDate = ""; public string ReportDate { get { return _reportDate; } set { _reportDate = value; } } private string _newFields = ""; public string NewFields { get { return _newFields; } set { _newFields = value; } } public SourceFileDetails() { } public SourceFileDetails(string fileNamePath, string fileExtension, string connectionString, string extractQuery, string reportDate, string newFields) { this.FileNamePath = fileNamePath; this.FileExtension = fileExtension; this.ConnectionString = connectionString; this.ExtractQuery = extractQuery; this.ReportDate = reportDate; this.NewFields = newFields; } }}

The following class is again an intermediate information storage and no other functionality.

C#

using System;using System.Collections.Generic;using System.Linq;using System.Text;namespace ST_b050e37a03e84ca7b15b853949d29aa6.csproj{ public class ItemValueHolder { private string _item = ""; public string Item { get { return _item; } set { _item = value; } } private object _itemValue = ""; public object ItemValue { get { return _itemValue; } set { _itemValue = value; } } private int _columnIndex; public int ColumnIndex { get { return _columnIndex; } set { _columnIndex = value; } } public ItemValueHolder() { } public ItemValueHolder(string item, object itemValue, int columnIndex) { this.Item = item; this.ItemValue = itemValue; this.ColumnIndex = columnIndex; } }}

The following method builds the connection string for the particular excel file supplied. It can build connection string irrespective of data has Headerrow not. Variable HasHeader determines if we have Header row or not.

C#

public string GetConnectionString(string FileNamePath, bool HasHeader) { string ConnectionString = ""; string Extension = Path.GetExtension(FileNamePath).ToLower(); string BinaryExcelProvider = "Microsoft.Jet.OLEDB.4.0"; string XmlExcelProvider = "Microsoft.ACE.OLEDB.12.0"; string BinaryExcelExtProperties = "Excel 8.0"; string XmlExcelExtProperties = "Excel 12.0"; string XmlMacroExcelExtProperties = "EXCEL 12.0 Macro"; string Provider = ""; string ExtendedProperties = ""; switch (Extension) { case ".xls": Provider = BinaryExcelProvider; ExtendedProperties = BinaryExcelExtProperties; break; case ".xlsx": Provider = XmlExcelProvider; ExtendedProperties = XmlExcelExtProperties; break; case ".xlsm": Provider = XmlExcelProvider; ExtendedProperties = XmlMacroExcelExtProperties; break; } string Header = ";HDR=NO;IMEX=1"; if (HasHeader) Header = ";HDR=YES;IMEX=1"; string ConnectionStringFormat = "Provider={0};Data Source={1};Extended Properties=\"{2}{3}\";"; ConnectionString = string.Format(ConnectionStringFormat, Provider, FileNamePath, ExtendedProperties, Header); return ConnectionString; }

All codes, packages(SSIS Project Solution file), sql scripts to generate destination table and sample excel files that can used for testing are attached.

Points of Interest

Multiple files loading with for each file enumerator instead using Search method is also uploaded. Please come back with questions or doubts, I will be glad to help you.

History

v3

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Dynamic Excel file loading with SSIS (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Lidia Grady

Last Updated:

Views: 6235

Rating: 4.4 / 5 (65 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Lidia Grady

Birthday: 1992-01-22

Address: Suite 493 356 Dale Fall, New Wanda, RI 52485

Phone: +29914464387516

Job: Customer Engineer

Hobby: Cryptography, Writing, Dowsing, Stand-up comedy, Calligraphy, Web surfing, Ghost hunting

Introduction: My name is Lidia Grady, I am a thankful, fine, glamorous, lucky, lively, pleasant, shiny person who loves writing and wants to share my knowledge and understanding with you.