Microsoft Excel’s trusted feature, Power Query is like a smart kitchen assistant that simplifies meal prep by automatically gathering, cleaning, and organising ingredients from various sources, ready to be served (loaded into Excel). It adapts easily to different ingredients and requires minimal input to prepare meals consistently.
In contrast, macros are akin to following a rigid recipe that needs to be written down step-by-step. While macros can automate repetitive tasks, they lack the flexibility and adaptability of Power Query. If you change an ingredient or the process, macros can fail without adjustments. Thus, Power Query enhances data preparation with a user-friendly interface and robust transformation capabilities, whereas macros follow precise instructions without the same level of interactivity or adaptability. However, it is to be noted that you can only access the Power Query Editor in the Microsoft Excel Desktop version. Hop on to take a look at how to use power queries in excel for advanced ETL tasks!!
What is a Power Query?
Power Query is a robust data transformation and connectivity tool integrated into Excel, revolutionising how data is handled. It allows users to connect to various data sources, apply transformations to clean and shape data, and load it back into Excel for further analysis or reporting. With its user-friendly interface and powerful capabilities, Power Query eliminates the need for extensive manual data manipulation or coding. It’s an essential skill for anyone involved in data analysis, enabling them to streamline workflows and focus on generating insights.
Power Query is not just a feature – it’s a comprehensive platform within Excel that can handle complex transformations with simple, user-friendly tools. Moreover, it integrates seamlessly with Power BI, extending its utility for professionals who need consistent data handling across Microsoft’s suite of analysis tools.
Enabling Power Query in Excel
For most Excel versions from 2016 onward, Power Query is enabled by default and accessible under the Data tab. However, if you’re using an earlier version or need to confirm its activation, follow these steps:
- Go to File > Options > Add-Ins.
- In the Manage dropdown, select COM Add-ins and click Go.
- Ensure Microsoft Power Query for Excel is checked and click OK.
This step ensures that the tool is available for use, making the transformation and connection processes as straightforward as possible.
How To Access Power Query Editor in Excel?
Follow these steps to launch the power query editor to extract, transform and load your data:
- Go to Data > Get Data > Launch Power Query Editor.
The Workflow of Power Query: Extraction / Connection, Transformation, and Loading
Understanding the Power Query workflow helps to streamline your approach to data preparation. Below, we’ll explore the workflow phases and demonstrate Power Query’s capabilities using an example dataset.
Extracting Data:
You can download all the files used in this blog form this folder. Feel free to practice at your end all the different illustrations shown above.
Establish connections to diverse data sources, such as Excel files, databases, CSV files, online platforms, and web data. This phase enables the retrieval of data without manual importing or copying.
- From an Excel Workbook: Navigate to Data > Get Data > From File > From Workbook, select the desired workbook, preview the data and then transform the data in Power Query Editor or directly load to main excel workbook.
- From a CSV File: Select Data > Get Data > From File > From Text/CSV, browse to the file location, and preview the data before importing then transform the data in Power Query Editor or directly load to main excel workbook.
You can also combine multiple files from a single folder, transform any columns or rows if necessary and then load them!
- From Web Data: Use Data > Get Data > From Web to input a URL and retrieve data from web pages. This is particularly useful for pulling in data such as financial reports or news updates.
- From Databases: Power Query connects to various databases, including SQL Server, Access, and Oracle, allowing for robust data extraction and manipulation.
Connecting to a SQL Server Database in Power Query
Prerequisites:
- SQL Server Database: You’ll need access to a SQL Server database.
- SQL Server Driver: Ensure you have the appropriate SQL Server driver installed on your system.
- Power Query: This is built into Excel and other Microsoft tools like Power BI.
Steps:
- Open Power Query Editor:
- In Excel, go to the “Data” tab and click “Get & Transform Data”.
- Select “From Database” > “From SQL Server Database”.
- Connect to the Server:
- Server Name: Enter the server name or IP address of the SQL Server instance.
- Database Name: Select the database you want to connect to.
- Authentication: Choose between Windows Authentication (if you’re using Windows authentication) or SQL Server Authentication (requires a username and password).
- Click “OK” to proceed.
- Navigate to the Table:
- Once connected, you’ll see a list of tables and views in the database.
- Select the table you want to query, preview the data and click on Load and Transform.
- Preview and Transform Data:
- You can now preview the data in the Power Query Editor.
- Apply transformations like filtering, sorting, adding columns, and more.
**
Transforming Data:
This phase involves applying modifications such as removing unnecessary columns, filtering rows, changing data types, and adding calculated columns. Power Query uses a step-based approach that records every change, making it easy to trace back and adjust transformations.
It’s essential to understand its layout to fully leverage its potential for data preparation. Let’s look at the key components within the Power Query Editor interface and how to use them effectively:
We will be using the sales data that we saw above in our Power Query Editor to understand how to perform transformations.
1. Ribbon
The ribbon at the top of the Power Query Editor window provides various tabs with commands grouped based on their functions. Here are the main tabs:
- Home Tab: This tab includes common actions such as Remove Columns, Keep Rows, Replace Values, Manage Columns, and other essential transformations. It also has options for closing and loading the data after transformations.
For ex, let’s say I would like to split the address into 3 separate columns – Street, City, and Pincode. Furthermore, I wish to get rid of the “Source.Name” column as I can always check the “Order Date” column to see which month’s data I’m looking at. Additionally, I also wish to remove any duplicate or blank rows. Take a look at the video below to see how it’s done using Power Query. Notice the formula bar has the formula version of the step you’re performing manually through the inbuilt function.
- Transform Tab: This tab offers more advanced data manipulation options like Pivot Column, Unpivot Columns, Group By, and Format for data cleaning and reshaping.
For example, we could use Group By to find the sum of sales across different cities. But before summing up the Price values, we must convert the data from default Text format to a more appropriate number/ decimal format if it wasn’t already auto formatted by Power Query itself.
However, if you come across the below error, it means that there is a row in column OrderID that doesn’t adhere to datatype set by Power Query. It sometimes automatically changes the data type of columns during the import process. This behavior happens because Power Query uses a feature called automatic data type detection when it first loads data. Power Query tries to determine the most appropriate data type based on the values it sees in the first few rows of the data.
The error may have arisen due to the fact that your data also includes the header row along with actual data – i.e, there is row with value “OrderID” along with rows with values like “10001”, “10002”, etc. Since this isn’t a number, the requested groupby and summing couldn’t take place. To resolve this error, we can simply remove any blanks or header rows when we first load the data. Here’s how you can do it:
**
Transformation in Power Query Using Google Gemini AI:
We can also achieve this by typing in the formula manually in Advanced Editor. I asked Gemini AI to give me the formula for the same and here’s what we got:
Prompt: “
I have a dataset with columns like “Purchase Address – City” and “Price Each.” I want to group the data by “Purchase Address – City” (assuming it includes only city information) in excel using Power Queries. After grouping by city, I want to calculate the total purchase amount for each city by summing the “Price Each” values within each group. Can you tell me the Power Query formula for this
“
Result:
As you saw, we got the exact same required results in both the cases! Since Gemini AI is trained on a massive dataset of text and code, including Power Query M language, it understands complex data manipulation tasks and provides accurate solutions. In fact, to make your job easier, you can also upload images of your dataset onto Gemini AI interface if you wish to and make full use of this AI tool’s data analysis expertise.
**
- Add Column Tab: This tab allows you to create new columns with options like Custom Column, Conditional Column, and Add Index Column.
Let’s say you want to categorize the sales into “Low Value” “Medium Value” and “High Value” sales based on certain thresholds and embed these as another column.
- View Tab: Provides options to manage the interface, such as toggling between query settings and previewing applied steps.
**
2. Query List Pane
On the left side of the window is the Query List Pane, which displays all the data queries within your workbook. This pane helps manage multiple data sources or queries, showing their relationships and statuses. You can rename, duplicate, or delete queries as needed.
Example: We have separate queries for “Schools in Karnataka” data, “Plastic Waste Data Around The World” data and “Sales Data” that we loaded from different sources earlier. All of them will be listed here, enabling you to switch between them easily.
3. Data Preview Area
The central part of the editor is the Data Preview Area, showing a snapshot of the data after each applied transformation. This view updates in real-time as you perform actions, giving immediate feedback on the changes made.
4. Applied Steps Pane
On the right side, the Applied Steps Pane records each transformation step as you make it. This step-by-step record allows you to track, revisit, and edit any change. This is kind of like recording a macro in Power Query with simplified data operations!
Key Points:
- Each step is named automatically (e.g., Filtered Rows, Removed Columns), but you can rename steps for clarity.
- Clicking on a step shows how the data looked at that point, enabling easy correction if needed.
5. Formula Bar
The formula bar, located just above the data preview, displays the M code for each applied step. This code provides an additional layer of customization for advanced users familiar with the M language.
6. Status Bar
At the bottom, the status bar shows the number of rows previewed and indicates whether the data sample is limited (e.g., “First 1,000 rows”). This helps manage performance, especially for large datasets.
- Loading Data
Once data is ready, it can be loaded back into Excel or added to the Data Model for use in pivot tables, charts, and dashboards.
**
Exploring the Capabilities of Power Query
Power Query’s functionality extends beyond simple data imports. It’s a powerful engine that enables you to handle complex data preparation tasks with ease. Here are some of its standout capabilities:
Merging and Appending Data: You can merge two or more tables based on common columns or append tables to create a larger dataset.
a) Append: Stack data from multiple sources, useful if managing quarterly project updates. This is identical to VSTACK in Excel Functions.
- Select the Tables: Choose the tables you want to append.
- Append Queries: Go to the “Home” tab and click “Append Queries.”
b) Merge: Combine data from different tables by matching a key column, like ProductID.
- Load Your Data: Import the two tables (e.g., “Electronic_Products_Info” and previously combined “Combined_Sales”) into Power Query.
- Select the Table to Merge: Choose the table that will be the “base” table (e.g., “Combined_Sales”).
- Merge Queries:
- Go to the “Home” tab and click “Merge Queries.”
- Select the other table (“Electronic_Products_Info”) as the “Second table.”
- Specify the common column for the merge (e.g., “ProductID”).
- Choose the join kind (e.g., left outer join to keep all rows from the first table).
- Expand the Merged Table:
- Right-click or click on the expand symbol on the newly created column (e.g., “Products”) and select “Expand”.
- Choose the columns you want to expand (e.g., “ProductName,” “Category”).
VBA Macros Vs Power Query : The Difference
Macros (VBA) and Power Query can both be used for automating and streamlining data processes in Excel, but their approaches and strengths differ significantly. VBA excels in scenarios requiring comprehensive automation, such as building complex reports, creating user interactions, or manipulating data across multiple sheets and applications. It’s highly flexible but requires coding knowledge, which can present a steeper learning curve.
Power Query, on the other hand, is ideal for tasks related to data import, cleaning, and transformation. It’s user-friendly, using a step-by-step interface to allow non-programmers to efficiently manage data from diverse sources. The main difference lies in the complexity and scope: VBA handles broader automation tasks beyond data preparation, whereas Power Query is optimized for data extraction, transformation, and loading (ETL) processes.
For repetitive data preparation tasks, Power Query’s traceable steps and automatic refresh are beneficial, while VBA is better suited when custom functions or multi-application workflows are needed.
Feature/Aspect | Macros (VBA) | Power Query |
Purpose and Use Case | Automates tasks including formatting, reporting, and complex operations involving other Office applications. | Automates data connection, cleaning, and transformation without coding; ideal for ETL tasks. |
Learning Curve | Steeper; requires programming knowledge of VBA. | Easier; user-friendly interface with point-and-click steps. |
Ease of Use | Best for those with coding experience. | More accessible to non-technical users and those comfortable with GUI tools. |
Flexibility | Highly flexible; supports custom functions and interactions across Excel and other Office apps. | Focused on data import and transformation; lacks broader automation capabilities. |
Functionality | Suitable for multi-step automation, user forms, and custom scripts. | Ideal for data merging, cleaning, and reshaping, but not broader Excel interactions. |
Error Handling | Provides debugging tools such as step-by-step code execution and variable inspection. | Step-based previews make it easier to identify and correct issues during transformation. |
Performance | Dependent on code optimization; direct cell manipulation can slow down large operations. | Typically more efficient with large datasets due to memory-based operations and optimized transformations. |
Maintenance | Requires coding knowledge to update and maintain. | Easier to modify and maintain due to a step-by-step transformation log. |
Best Use Cases | Complex report formatting, multi-application tasks, and advanced user interactions. | Data import, cleaning, merging, and shaping for analysis and reportin |
Limitations
Complex Data Types: Power Query doesn’t always handle complex data types as efficiently as other tools. For example, handling nested JSON or hierarchical structures can be cumbersome and may require multiple steps or custom coding.
Limited Data Model Features: Power Query allows for transforming and loading data, but it doesn’t have advanced modelling capabilities like creating relationships between tables, which is handled by the Data Model in Excel or Power Pivot. You’ll need to use Power Pivot for that.
Key Tips for Power Query Users
- Name Your Query Steps: When you apply transformations, Power Query automatically names each step (e.g., “Changed Type”, “Filtered Rows”). Renaming these steps makes your query easier to follow, especially when revisiting or sharing with others.
- Minimize Data for Performance: When working with large datasets, avoid loading unnecessary columns or rows. Power Query performs better and is more manageable with smaller data sets during transformations.
- Use Query Folding: Query folding is when Power Query pushes transformations back to the data source, allowing for faster processing. For database sources like SQL Server, try to use transformations that can be folded to improve performance.
- Keep Queries Organized: As your data models grow more complex, keep related queries in folders and apply consistent naming conventions for clarity.
- Handle Errors Gracefully: Errors can occur during data transformations, especially when dealing with mixed data types. As Power Query is pretty adaptive and can add a couple of transformation on it’s own by studying the data, there might be cases where it may cause issues in subsequent steps. One way to tackle it if you know M language is to cross verify the code and remove any unnecessary parts. Else, you could use the Replace Errors feature to manage errors efficiently or create conditional logic to bypass problematic rows. And of course, in case of any errors, Google Gemini AI is at your dispense to help you debug them!
- Quality Control Check: Since Power Query has a black box implementation, to backtrack or understand the working on each in-built function, you would have to have thorough knowledge of the M language to perform any crucial quality checks. To facilitate efficient data correctness checks, you could use another data analysis tool or Excel itself with a smaller data sample to cross-verify if power query gave you the correct results.
- DataFormat.Error: If you ever come across a DataFormat.Error when doing some transformations, know that this indicates an issue with data type mismatches or invalid values. To resolve this:
a) Check Data Types: Ensure columns have correct data types (e.g., text, number, date). Use the “Change Type” function to correct errors.
b) Handle Missing Values: Use functions like List.Missing and List.ReplaceMissingWith to handle missing values.
c) Convert Text to Numbers: Use the Number.FromText function to convert text representations of numbers.
d) Handle Date and Time Formats: Use the Date.FromText and Time.FromText functions to convert text strings to date and time values.
e) Clean Data Source: Ensure the data source is clean and consistent.
f) Use Formula Bar: Inspect and edit M code directly.
Conclusion: Maximizing Data Preparation with Power Query
Power Query in Excel transforms how users approach data preparation in Excel. By mastering its functionalities, you can connect to multiple data sources, clean, transform, and prepare data in a way that is both efficient and repeatable. The integration of M code further expands the possibilities, empowering users to build complex transformations and automate data workflows. Whether you’re a business analyst, data scientist, or Excel enthusiast, Power Query can elevate your data analysis, making your work more effective, fast and insightful.
Start experimenting with Power Query today and unlock a new level of productivity within Excel!