Contents
Purpose
An essential foundation of data management is the two-dimensional table structure of fields and records. It is an elegant and simple way to store and retrieve data. Intuitively, it mirrors how we remember and communicate information.
Think of all the people you know, or scroll through your phone contacts. Based on fields that describe significant attributes of a person – name, address, birthday, etc. – each contact is stored, in your phone’s memory or your own, as a record. Because they all possess similar attributes – with different attribute values, you can store hundreds of them and easily find the unique one when you need it.
Columns and rows are the DNA of Excel, yet despite this implicit invitation to adopt fields-and-records structures, the opportunity is too often lost. Users jump to visuals, treating the sheet like a white board. Blocks of disconnected data are separated by blank or colored columns and rows. With no logical framework, formulas are cell-by-cell with anonymous and cryptic references.
According to Excel’s big brother Access, the data table is to be protected, to ensure the security and integrity of the source data. To input data, you create a form; to output data, you create queries and reports. You don’t mess with the tables. Access clearly separates the duties of data management.
Excel is less strict about the rules and more popular than his brother, and has had some discipline problems. He’s matured a lot and become more responsible about data management. To draw out this better nature we can use tables.
When you define a range of data as a table you are taking a step towards principled design. The table gently guides you back to the two-dimensional data model with these standard features:
A single row of unique text column headings = field names
A contiguous region of data rows below the headings = records
Boundaries that adjust automatically as data are added or deleted
An intuitive reference model
Tables are more than a convenience and will become your favorite tool to manage data, from complex datasets to short, single-column lists. To convert a range to a table, Ctrl+T to apply the default table format, or Alt-H,T to browse styles.
Table Referencing
The table reference model begins with the header row to define unique field names. If you try to reuse a name, the table will automatically append a number to make it unique. To avoid this, define the fields and make the names unique from the start.
Within the table, a field is referenced with square brackets as [Field]. This refers to the entire data field, all table rows excluding the header row. This gives the data set a meaningful name, making table formulas transparent and easy to read, such as =AVERAGE(Data[Income]).
For the rows there is a special reference for the current row. To refer to another field in the same record, use [@Field]. @ means intersection of the current row and the [Field] column. You might see @ automatically appear in other contexts outside of tables if there’s an implicit intersection, but it won’t affect calculation. Within the table it is intentional and explicit.
There are special notations for each part of the table. Review the list below, and practice with a sample table, navigating all around the table to see what kind of reference you get in each area. Note that OFFSET allows reference to the current cell, which would otherwise be a circular reference error.
To become fluent with Table referencing, practice navigation using the square bracket notation. When you type “=[“, a picklist of all the Table’s field names opens for convenient selection. At the top of the list is the “@ - This Row” option. Tab to select it, and only field names remain. Arrow down to the field you want, and Tab to enter it.
The fields are listed in the order entered, not alphabetically. This is helpful because primary key fields are most likely the first columns, and likely to be target references. The list is also filtered as you begin to type.
To step through an example, imagine you have a table, or create one, in which the first two columns are Name and DOB (date of birth), and you want to calculate Age. You begin typing your formula, and when you’re ready for the DOB, hit five keys: [ Tab ↓ Tab ] ; [ opens the field list; Tab enters @; ↓ selects DOB; Tab enters it; ] closes the reference. With practice, this becomes an efficient sequence for formula entry.
For your interest, to be perfectly precise requires a longer formula for Age, but this one is simple and very accurate: =ROUNDDOWN((TODAY()-[@DOB]) /365.25,0).
Table Behavior
The table likes to stick together. Think of the table edge as a rubber band that expands and contracts as columns and rows are inserted or deleted. Practice using it to select table contents. There’s a fine line between selecting a data column, a table column and a spreadsheet column. Place a table with the header in row 1 to practice the following selection methods.
Hover over a table column heading. Slowly move the mouse up until it first becomes a down arrow. Click to select the table data column. This is a good trick for selecting the field as a function argument. Click again to add the header cell to the selection. Now you can drag the column to rearrange the order, or Ctrl+ to insert a column. Continue moving up until it becomes a down arrow the second time, when it’s above the sheet column heading. Click to select the entire sheet column.
Likewise with rows, practice selecting records from the left table edge. Once a record is selected you can drag it up or down to manually sort. When doing so, notice the subtly different drag behavior in the table versus an open range.
When you drag a range on the open sheet and drop it onto a filled range, the default behavior is to move the data, that is, to cut and paste, overwriting (with your permission) any data in the target range. By holding down Ctrl as you drop, you can change cut to copy, and by holding down Shift you can change overwrite to insert, pushing the existing content down or right.
In the table, the default is cut and insert, like rearranging the pieces of a picture puzzle. You can still change the default with Shift and Ctrl. Table drag and drop only works with complete columns and rows. You cannot drag a single cell or group of cells that does not extend to the table edges.
Finally, hover over the top left corner of the table, until the cursor becomes a diagonal arrow, then click to select the data area, and again to select the entire table with headers.
With the active cell within the table, hit Ctrl+ to insert a table row or Ctrl- to delete a row. This is a nice shortcut, bypassing the dialog you would get on the open sheet. Particularly if you have multiple tables or table and non-table data next to each other, it inserts or deletes the row in the table only without affecting the rest of the sheet. In effect, the table is like a sheet within a sheet. Watch out for data below the table, which can be affected. As you will see in the Projects, tables can be stacked if the lower table has the same number or fewer columns as the upper table.
When you enter a formula anywhere in a data column, make sure the column is empty. The table will graciously copy the formula to the entire column, even to new rows inserted or added at the bottom. It will not prevent you from later typing in a number or a different formula in any row. Error checking will warn you of the inconsistency, and you will learn discipline.
Adopting tables gets you into the mindset of robust structural design. While less formal than a relational database table, the Excel table offers integrity, elegance and transparency. The expectation is that you are intentionally structuring your data to exploit these advantages and will respect their humble requirements.
Treat each table row as a record. Keep each column a consistent data type. Use a table format from the gallery or create your own. If using Conditional Formatting, apply it to the whole table or whole columns, never format individual rows. To manually highlight rows, insert a small flag column, such as a single digit or letter, and base the format on it. Apply formulas to whole columns, never to individual rows. Don’t insert blank columns or rows or otherwise insult the table, and it will treat you well.
Pivot Tables
Pivots are powerful tools for analyzing significant amounts of data. They are a great payoff for the effort to structure your data. If you have tabular data you can use a pivot to cross-reference any two or more fields.
In a table, columns are fields and rows are records, period. With a pivot, you discover relationships between fields. You can put one field in rows like the table, another in columns, and summarize a third at each column-row intersection with statistical functions. Thus you pivot your view of the data.
Take an example from an Order detail table, with columns for Customer, Item, Quantity and Price. If you pivot Customer rows and Item columns, and sum Revenue = Quantity * Price in the middle, you then have a statistical summary view to customers, items, and to who orders what.
As another example, think about monthly records. It’s common to see accounts in rows and months in columns. This is a quick and easy entry method, but “Dec 2020” is not a field, it’s a value. Good data management reserves columns for fields, which can then be pivoted. You can’t pivot data that’s already pivoted.
With month as a column, data entry takes amount, account and month. The pivot can then select the date range with months as columns, if desired. This entails more table rows, but so what? With pivot power for data display, you specialize the table for data storage. Rows are cheap.
Choose pivot column fields carefully. You don’t want so many columns that you have to scroll left and right a lot. In the Order table example, you might have product category that would fit better in the columns range. The power of the pivot is in summarization, to expose patterns and trends. If you get too granular, you may as well just read the table.
The Keyboard Trainer is an interesting example of a pivot used for selection rather than output. The Team Calendar Analysis Sheet has output pivot tables and charts.
Data Joins
Integrated data models are split into multiple tables. If you are entering sales orders, you don’t want to type in the customer information and product information as you enter each order. You want these data to be stored in other tables, so that you can look them up and retrieve their details, saving effort and errors as well as data storage space.
Database structure requires that every table has a key that makes each record unique. When you use a value from a table field to look up other fields for the same record, you first need to know that the lookup value is unique within the lookup field. This is an essential principle of data management.
A key can be a single field or a combination of fields. For example, a customer table has a unique account number; a product table has a unique product code; an order table puts customers and products together, and has a unique order number; an inventory table may have material and location numbers, which are not unique in themselves, but are unique when combined.
Table joins come in different types, the most common is many-to-one. The order table can have many records with the same customer, all joined to a single record in the customer table. This enables efficient and secure data management. The first step to creating a coherent data model is to learn these principles, take stock of table and field names and keys, and draw a diagram of table/field joins.
Table links can be set up in the Data Model via the Relationship Manager, Alt-A,A. Use the “Create Relationship” dialog (no, that does not refer to pickup lines) to connect tables by linking a “table / foreign key” (the “many” side) to a “related table / primary key” (the “one” side). Each table must have its own unique primary key, which can also be called the lookup column. The Data Model can then be used to generate pivot tables with all fields from both tables.
In the Order table, Order[Number] is the primary key. In the Customer table, Customer[Account] is the primary key. Order[Account] is a foreign key because it resides in the Order table, but its values come from the Customer table.
Tables can also be joined by lookup functions. There are various ways to look up data. The first one learned is usually VLOOKUP. I prefer INDEX-MATCH because it describes precisely what I’m doing, and because the input ranges are more direct and flexible. I can MATCH on one column and INDEX on another without concern for their relative positions. MATCH type 0 finds the exact ordinal (nth) record number of the unique value within the lookup field, passes it to INDEX, and INDEX uses that to retrieve the value of any other field in the same record. For example, if Account is a named input cell and Name is an output cell,
Name = INDEX(Customer[Name],MATCH(Account,Customer[Account],0)) “Lookup this Customer’s Account and give me their Name.”
You can choose to join the tables via the Data Model, or just retrieve a few columns from the reference table to add to the analysis table. Think about your purpose and which table you’re most interested in. In either case, joining tables requires a single key field. If you have a multi-field key used in multiple tables, concatenation is a simple way to turn it into a single field for easy lookup.
For example, you have city-level data, geographic characteristics in table Geo and demographic data in table Demo. The key to both tables would be state and city together, since neither is unique by itself, but the combination is unique. To link them, add a column to each table, “CityST”, then use CityST as the key for MATCH and INDEX. If you’re interested in the demographic data and you want to add city area to it, you can now add column Area to table Demo.
Geo[CityST] = Geo[@City]& “, “ &Geo[@ST] Demo[CityST] = Demo[@City]& “, “ &Demo[@ST] Demo[Area] = INDEX(Geo[Area],MATCH(Demo[@CityST],Geo[CityST],0))
Note the “, “ in the middle of CityST is optional, but makes it easier to read and potentially useful as output. If you’re not sure if every record has a match in the remote table, protect the formula with something like =IFNA(formula, ”no data”).
Slicers
Table and pivot filters are really great tools, with filter by value, by range, by text, even by cell color. But if users commonly need to filter, consider all the clicks required. Click the filter button, uncheck all values, scroll to find the desired value, check the value, click OK to activate. Do you really want to presume that they are inclined to do that, and will do it efficiently and effectively?
A slicer gives you a button for each value in a field, offering one-click access to filter the data. It makes it obvious that the filter is set, and makes it easy to unset it.
Slicers are applied to fields that represent data drivers or categories, fields that you would normally think of filtering or pivoting. The idea of using slicers is to replace and turn off the default filters, so they should be selected and arranged with care.
Pay attention to Slicer sorting. In Excel options you can define custom lists with custom sort. Find this option by Alt-F,T,A, then repeat Alt+O four times and hit Enter. These enable you to flash-fill and retain the custom sorting. You can add your own lists, for example, company divisions, markets or product categories. Slicers can use these custom lists to sort the button values, however this does not transfer the sort to other computers. What looks good to you may not look good to others. To convey custom lists to others requires an export-import of the registry entry, or a macro to recreate the lists. That’s probably more trouble than value.
As an alternative, create a column for the purpose of customizing and sorting slicer buttons, with concatenated or prefixed values. It can be hidden from view. For example, if you have a column of course numbers and another with course titles, create a hidden column concatenating the number with a shortened title to feed the slicer buttons. With any column you can add a one or two digit prefix to sort buttons without detracting much from the look.
The downside of slicers is the real estate they require. One way to manage it is to put slicers across the top or on the left side of the screen, and freeze panes. Slicers default as a single column of buttons but you can set the size and number of columns to make them horizontal or rectangular.
Slicer examples are found in the Keyboard Trainer and the Startup Outline Sheet.
Data Entry
Tables are great for storing and accessing data, and for calculating columns from other columns. They’re also not bad for data entry. Using the Tab key you can move across the columns, and once you reach the last column Tab takes you back to the first column and down a row, just like a typewriter “carriage return”. This works well when you have a reasonable number of columns, so Tab key might be all the data entry help you need.
If a given column’s entry must be one of a defined list, use Data Validation, List option. For example, a survey might have an affiliation field. With the list range defined and set as the only valid entries, you get a drop-down list in each cell. This reduces keystrokes and errors, and makes the table more like a form. Another option to simulate a data form is the Form tool.
The Form tool is a nice partner for a table, giving an alternate view and entry point for the data. The special service it provides is to put a single record on the stage, where you can see all or most fields at once. Field names are displayed in a column that adjusts to fit the longest name, with a column of uniform value boxes next to it that hold about 10 characters before it begins to scroll. This makes it hard to use with longer entries.
To use the Form tool, install the button on your Quick Access Toolbar or custom ribbon group. Alt-F,T,Q, Alt+C to look in “All commands”, type F and scroll to Form. Then with the active cell in a table, click the Form button. Form can display up to 32 fields, with calculated columns grayed out. Due to the limited size of the entry box, it is best suited for numeric or other short entries.
Comments