fncCombineSheets

Beyond Excel's (BXL's) Function:

Combine Worksheets

Name: fncCombineSheets()


Use:

Combine all sheets in a workbook into one table. 


Description:

It is a common practices to have workbooks where each worksheet contains data for a customer or month. For analysis and reporting the modeler may want each worksheet's data placed into one table. This function does that.


Assumptions:


Parameters

This function has four parameters: sWorkbookPath, lColumnCount, ExcludeSheets, and ExcludeTitles .


Example:
In the sample workbook is a query called "Invoked Function" which looks something like below. It demonstrates how to call this function.

let

    Source = fncCombineSheets("C:\Customers.xlsx", 5, "Customer:")

in

    Source


Code:
Copy this code and paste it  into PQ's advanced editor.

/*  Name: fncCombineSheets

    Description: Combine all sheets in a workbook into one table. 

    Requisites:  The workbook must be saved

                 All worksheets in the workbook have identical formats

                 The worksheet name identifies what the data pertains to (customer, month, etc.)

    Inputs:      WorkbookPathWorkbook's full name with path

                 ColumnCount:   Number of columns containing data in each worksheet

                 ExcludeSheets: Worksheets to exclude from combining 

                 ExcludeTitles: First few characters of worksheet titles to exclude from the data

    Outputs:     A single table containing all data from all worksheets

    Example:     Source = fncCombineSheets("C:\MyCustomers.xlsx", 5, "Sheet1,Sheet2", "Customer:")


       Date  Ini Description

    02/01/23 CWH Original Development

    02/02/23 CWH Added ExcludeSheets

    02/04/23 CWH Added ExcludeTitles

*/


(WorkbookPath as text, ColumnCount as number, ExcludeSheets as text, ExcludeTitles as text) =>


let

//  Open Excel Workbook 

    Source          = Excel.Workbook(File.Contents(sWorkbookPath), null, true),


//  Select Sheets

    RmvSheetSpaces  = Text.Replace(Text.Replace(ExcludeSheets, " ,", ","), ", ", ","),

    SheetList       = Text.Split(RmvSheetSpaces, ","),  /* Convert CSV to List */

    SelSheets       = Table.SelectRows(Source, 

                        each not(List.Contains(SheetList,[Name])) and [Kind]="Sheet"),


//  Only keep "name" and "data" columns. Name contains sheet name, Data contains worksheet data

    RmvCols         = Table.SelectColumns(SelSheets,{"Name", "Data"}),


//  Create a list of column names ("Column1", "Column2", etc.)

    ColHdrs         = List.Generate(() => 1, 

                                    each _ <= lColumnCount, 

                                    each _ + 1, 

                                    each "Column" & Number.ToText(_)),


//  Expand the data (worksheet columns) 

    ExpData         = Table.ExpandTableColumn(RmvCols, "Data", ColHdrs),


//  Remove excluded Title rows

//  fnBeginsWith credit: Sergei Baklan - This is like List.Contains but only looking at the first characters

//   https://techcommunity.microsoft.com/t5/excel/power-query-filter-table-based-on-list-with-wildcards/m-p/2418638

    fnBeginsWith    = (list, str) => List.Contains(

                        List.Transform(list, each Text.StartsWith(str, _)),

                        true),

    RmvTitleSpaces  = Text.Replace(Text.Replace(ExcludeTitles, " ,", ","), ", ", ","),

    TitleList       = Text.Split(RmvTitleSpaces, ","),    /* Convert CSV to List */

    RmvTitles       = Table.SelectRows(ExpData, 

                        each not fnBeginsWith(TitleList, [Column1])),


//  Remove Empty Rows

    RmvNulls        = Table.SelectRows(RmvTitles, 

                        each List.NonNullCount(Record.ToList(_)) <> 1),


//  Promote column headings

    PromoteColHdrs  = Table.PromoteHeaders(RmvTabHdrs, [PromoteAllScalars=true]),


//  Rename first column to "Sheet"

    FirstCol        = Table.ColumnNames(PromoteColHdrs){0},

    RnmCols         = Table.RenameColumns(PromoteColHdrs,{{FirstCol, "Sheet"}}),


//  Remove repeated column headings from other worksheets

    SecondCol       = Table.ColumnNames(RnmCols){1},

    RmvDupColHdrs   = Table.SelectRows(RnmCols, 

                        each not(List.Contains({SecondCol}, Record.Field(_, SecondCol))))


in

    RmvDupColHdrs

Download:
The workbook at right contains sample data and this function. To download it:

You should see the results it produces.

CombineSheets.xlsx