“Pivoting” data with VBA
up vote
2
down vote
favorite
I've attempted to erite some VBA for this this question. The output is OK
but the code is not very elegant. I'm happy-ish with creating unique list of names and certificates, but the rest seems rather ugly. I'd love to learn how to make it more elegant and programmer-like (and less amateur-like-crap).
Sub PivotData()
Dim rng As Range, cll As Range
Dim arr As New Collection, a
Dim var() As Variant
Dim l As Long
Dim lRow As Long, lCol As Long
l = 1
Set rng = Range("A2:C7")
' Create unique list of names
var = Range("A2:A7")
On Error Resume Next
For Each a In var
arr.Add a, a
Next
For l = 1 To arr.Count
Cells(l + 1, 5) = arr(l)
Next
Set arr = Nothing
' Create unique list of certificates
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
For l = 1 To arr.Count
Cells(1, 5 + l) = arr(l)
Next
Set arr = Nothing
On Error GoTo 0
' Ugly code, how to make it more elegant?
Range("F2").FormulaArray = _
"=IFERROR(INDEX(R2C3:R7C3,MATCH(1,((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C)),0)),"""")"
With Range("F2")
lRow = .CurrentRegion.Rows.Count
lCol = .CurrentRegion.Columns.Count + 4
End With
Range("F2:F" & lRow).FillDown
Range(Cells(2, 6), Cells(lRow, lCol)).FillRight
End Sub
algorithm vba excel
New contributor
add a comment |
up vote
2
down vote
favorite
I've attempted to erite some VBA for this this question. The output is OK
but the code is not very elegant. I'm happy-ish with creating unique list of names and certificates, but the rest seems rather ugly. I'd love to learn how to make it more elegant and programmer-like (and less amateur-like-crap).
Sub PivotData()
Dim rng As Range, cll As Range
Dim arr As New Collection, a
Dim var() As Variant
Dim l As Long
Dim lRow As Long, lCol As Long
l = 1
Set rng = Range("A2:C7")
' Create unique list of names
var = Range("A2:A7")
On Error Resume Next
For Each a In var
arr.Add a, a
Next
For l = 1 To arr.Count
Cells(l + 1, 5) = arr(l)
Next
Set arr = Nothing
' Create unique list of certificates
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
For l = 1 To arr.Count
Cells(1, 5 + l) = arr(l)
Next
Set arr = Nothing
On Error GoTo 0
' Ugly code, how to make it more elegant?
Range("F2").FormulaArray = _
"=IFERROR(INDEX(R2C3:R7C3,MATCH(1,((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C)),0)),"""")"
With Range("F2")
lRow = .CurrentRegion.Rows.Count
lCol = .CurrentRegion.Columns.Count + 4
End With
Range("F2:F" & lRow).FillDown
Range(Cells(2, 6), Cells(lRow, lCol)).FillRight
End Sub
algorithm vba excel
New contributor
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I've attempted to erite some VBA for this this question. The output is OK
but the code is not very elegant. I'm happy-ish with creating unique list of names and certificates, but the rest seems rather ugly. I'd love to learn how to make it more elegant and programmer-like (and less amateur-like-crap).
Sub PivotData()
Dim rng As Range, cll As Range
Dim arr As New Collection, a
Dim var() As Variant
Dim l As Long
Dim lRow As Long, lCol As Long
l = 1
Set rng = Range("A2:C7")
' Create unique list of names
var = Range("A2:A7")
On Error Resume Next
For Each a In var
arr.Add a, a
Next
For l = 1 To arr.Count
Cells(l + 1, 5) = arr(l)
Next
Set arr = Nothing
' Create unique list of certificates
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
For l = 1 To arr.Count
Cells(1, 5 + l) = arr(l)
Next
Set arr = Nothing
On Error GoTo 0
' Ugly code, how to make it more elegant?
Range("F2").FormulaArray = _
"=IFERROR(INDEX(R2C3:R7C3,MATCH(1,((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C)),0)),"""")"
With Range("F2")
lRow = .CurrentRegion.Rows.Count
lCol = .CurrentRegion.Columns.Count + 4
End With
Range("F2:F" & lRow).FillDown
Range(Cells(2, 6), Cells(lRow, lCol)).FillRight
End Sub
algorithm vba excel
New contributor
I've attempted to erite some VBA for this this question. The output is OK
but the code is not very elegant. I'm happy-ish with creating unique list of names and certificates, but the rest seems rather ugly. I'd love to learn how to make it more elegant and programmer-like (and less amateur-like-crap).
Sub PivotData()
Dim rng As Range, cll As Range
Dim arr As New Collection, a
Dim var() As Variant
Dim l As Long
Dim lRow As Long, lCol As Long
l = 1
Set rng = Range("A2:C7")
' Create unique list of names
var = Range("A2:A7")
On Error Resume Next
For Each a In var
arr.Add a, a
Next
For l = 1 To arr.Count
Cells(l + 1, 5) = arr(l)
Next
Set arr = Nothing
' Create unique list of certificates
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
For l = 1 To arr.Count
Cells(1, 5 + l) = arr(l)
Next
Set arr = Nothing
On Error GoTo 0
' Ugly code, how to make it more elegant?
Range("F2").FormulaArray = _
"=IFERROR(INDEX(R2C3:R7C3,MATCH(1,((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C)),0)),"""")"
With Range("F2")
lRow = .CurrentRegion.Rows.Count
lCol = .CurrentRegion.Columns.Count + 4
End With
Range("F2:F" & lRow).FillDown
Range(Cells(2, 6), Cells(lRow, lCol)).FillRight
End Sub
algorithm vba excel
algorithm vba excel
New contributor
New contributor
New contributor
asked yesterday
Michal Rosa
1134
1134
New contributor
New contributor
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
2
down vote
accepted
Fully Qualify Ranges
Your code assumes that the correct worksheet will be active when the code is ran. You should get in the habit of Fully Qualifying all range references. This will ensure that your code will work as intended no matter what worksheet is activated.
Technically, to be fully Fully Qualified a range needs to be referenced by workbook and worksheet.
ThisWorkbook.Worksheets("Sheet1").Range ("A2:C7")
Or
Workbooks("Some Book").Worksheets("Sheet1").Range ("A2:C7")
But in general it is acceptable to exclude the workbook if you are not going to be working with multiple workbooks.
Worksheets("Sheet1").Range ("A2:C7")
With statement blocks should be used so that you don't have to repeatedly requalify your ranges.
With ThisWorkbook.Worksheets("Sheet1")
For Each a in .Range ("A2:C7")
Dynamic Ranges and Relative References
Dynamic Ranges should be used when working with records. In this way, you will not have to rewrite you code every time a recorded is added or deleted.
This applies to both ranges
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each a In .Cells
arr.Add a, a
Next
End With
End With
and formulas and FormulaArray
Range("F2").FormulaArray = "=IFERROR(INDEX(OFFSET($C1,1,0,COUNTA($A:$A)-1,1),MATCH(1,((OFFSET($A1,1,0,COUNTA($A:$A)-1,1)=$E2)*(OFFSET($B1,1,0,COUNTA($A:$A)-1,1)=F$1)),0)),"""")"
Having the macro hard the ranges for the FormulaArray is an acceptable middle ground. I would do this to prevent the formulas from slowing up the workbook.
Error Handling
It is best to reduce the scope of On Error Resume Next
as much as possible. This will give you better information when something goes wrong.
On Error Resume Next
For Each a In var
arr.Add a, a
Next
On Error GoTo 0
Or
For Each a In var
On Error Resume Next
arr.Add a, a
On Error GoTo 0
Next
Variable Naming
Don't not use l
as a variable name. It is too hard to distinguish from 1
.
l = 1
does nothing. The For l = 1
initiates l
to 1
.
arr
should only be used to name arrays variables. Personally, I default to data
, result
or results
.
Although there is nothing wrong with a
, I prefer v
, key
, or item
.
Alternate Methods
Here are some other ways of pivoting the data without using worksheet formulas.
Indexing Array Using ArrayList to Sort Headers
This uses ArrayList
to store and sort the headers. The position of the headers in the ArrayList
is used as indices for a 2 dimensional array.
Sub PivotDataIndexedArray()
Dim key As Variant, data() As Variant
Dim rowHeaders As Object, columnHeaders As Object
Set rowHeaders = CreateObject("System.Collections.ArrayList")
Set columnHeaders = CreateObject("System.Collections.ArrayList")
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each key In .Value
If Not rowHeaders.Contains(key) Then rowHeaders.Add key
Next
For Each key In .Offset(0, 1).Value
If Not columnHeaders.Contains(key) Then columnHeaders.Add key
Next
data = .Cells.Resize(, 3).Value
End With
Dim results() As Variant
Dim n As Long
ReDim results(1 To rowHeaders.Count + 2, 1 To columnHeaders.Count + 2)
'Add names
rowHeaders.Sort
For n = 0 To rowHeaders.Count - 1
results(n + 2, 1) = rowHeaders(n)
Next
'Add categories
columnHeaders.Sort
For n = 0 To columnHeaders.Count - 1
results(1, n + 2) = columnHeaders(n)
Next
Dim r As Long, c As Long
'Add dates
For n = 1 To UBound(data)
r = rowHeaders.IndexOf(data(n, 1), 0)
c = columnHeaders.IndexOf(data(n, 2), 0)
results(r + 2, c + 2) = data(n, 3)
Next
.Range("F1").Resize(UBound(results), UBound(results, 2)).Value = results
End With
End Sub
ADODB CrossTab Query
An ADODB.Connection
query can be used to pivot the data. The source data should be the only data on the worksheet, otherwise, you would have to specify the datas' range on the worksheet in the query (e.g [Sheet1$A1:C7]).
Sub PivotDataADODBQuery()
Const SQL As String = "TRANSFORM First([Date])" & vbNewLine & _
"SELECT [Name]" & vbNewLine & _
"FROM [Sheet1$]" & vbNewLine & _
"GROUP BY [Name]" & vbNewLine & _
"PIVOT [Certificate];"
Dim conn As Object
Dim rs As Object
Set conn = CreateObject("ADODB.Connection")
conn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & ThisWorkbook.FullName & "';Extended Properties='Excel 12.0;HDR=YES;IMEX=1';"
conn.Open
Set rs = conn.Execute(SQL)
With ThisWorkbook.Worksheets("Sheet2")
.Cells.ClearContents
Dim c As Long
For c = 0 To rs.Fields.Count - 1
.Cells(1, c + 1).Value = rs.Fields(c).Name
Next
.Range("A2").CopyFromRecordset rs
End With
rs.Close
conn.Close
End Sub
add a comment |
up vote
1
down vote
I would like to add a few more pieces of advice to what @TinMan has already provided.
Naming
Good naming is one of the most useful and at the same time hardest things in programming. When you have to come back to your procedure 6 months in the furture, you will thank yourself for using descriptive names. In this case, the procedure is relatively short so that understanding what it does is not too hard, but in more complicated code, good names can make a huge difference.
Because of this, my advice is to go beyond the advise @TinMan has provided and to generally use descriptive names. (It is no problem if they get longer in the process.) E.g. the name rng
does not really tell you anything about what it is; calling it sourceRange
would certainly convey its purpose better.
Single Responsibility Principle
Another good guiding principle is the so called single responsibility principle. In short, it says that a usint of code should always only be responsible for one and only one thing. This makes it a lot easier to understand and modify code.
Getting this right is rather hard. However, there is a one rough guidline that can help: whenever you have the urge to add a header comment, you probably want to extract a procedure or function for whatever is done in the section.
In your case, one responsibility is to know how to extract values from a range. You could define a function to do that as follows.
Private Function DistinctValues(inputRange As Excel.Range) As Collection
Dim allValues() As Variant
allValues = inputRange.Value
Dim uniqueValues As Collection
Set uniqueValues = New Collection
Dim currentValue As Variant
For Each currentValue In allValues
On Error Resume Next
uniqueValues.Add currentValue, currentValue
On Error Goto 0
Next
Set DistinctValues = uniqueValues
End Function
If you ever want to change how to get distinct values, e.g. using a Scripting.Dictionary
, you just have to change it in this one place. Moreover, your code is easier to understand if instead of
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
you have
Dim certificatesRange As Excel.Range
Set certificatesRange = Range("B2:B7")
Set distinctCertificates = DistinctValues(certificatesRange)
You could also separate the actual assembling of the new table into a procedure that takes some base point, e.g. the upper-left corner of the target range, a collection of column headers, one of row headers and the data source range.
With this, let us call it BuildPivotTable
, your top procedure would look something like this:
Public Sub PivotData
Dim sourceRange As Excel.Range
Set sourceRange = Range("A2:C7")
Dim columnHeadersRange As Excel.Range
Set columnHeadersRange = Range("B2:B7")
Dim distinctColumnHeaders As Collection
Set distinctColumnHeaders = DistinctValues(columnHeadersRange)
Dim rowHeadersRange As Excel.Range
Set rowHeadersRange = Range("A2:A7")
Dim distinctRowHeaders As Collection
Set distinctRowHeaders = DistinctValues(rowHeadersRange)
Dim targetRange As Excel.Range
Set targetRange = ActiveSheet.Range("E1")
BuildPivotTable targetRange, distinctColumnHeaders, distinctRowHeaders, sourceRange
End Sub
This separates getting the data from doing things with it. Should you want to change where you get your data, you will no longer have to find the appropriate passages between the code doing things with the data. You can even further enhance this by extracting the passages getting the various ranges into their own functions responsible to know where to get the particular data from.
Again, this is not that cruicial for this size of method, but as things grow larger, which tends to happen rather fast whan adding functionality to things, a good separation of responsibilities can help a lot; it certainly justifies the extra code you have to write to achieve it.
Do not Reuse Variables For Different Things
In your code, you first use the variable arr
for distinct names, then for distinct certificates. This makes it harder to follow what the the collection alrady contains and actively hinders good nameing: you cannot name it after what its purpose is if it has multiple ones.
It really does not cost a lot to generate yet another object. So such micro-optimizations should be avoided in favor of ease of reading the code.
Declare Variables Close to Their First Usage
Somewhat related to the last point, it is usually better to declare variables as close to their first usage as possible. This has the advantage that you cannot accidentaly add an access to the variable before that point (The compiler will yell at you.) and that you can be sure at that point that you have a clean object. E.g. a collection will not already contain something.
I know that this contradicts VBA style guids. However, those have been written in the 90s and which practices are deemed useful has involved in the past two decades.
Note that this guideline is much less relevent in short methods following the single responsibility principle.
Data Input
After the general pieces of advice applicable to basically all programming languages, let me come to some more Excel and VBA specific point.
@TinMan already pointed out to possible enhancemets to make the code work with dynamic ranges of data. I would like to add two options: names ranges and list objects.
Named Ranges
I am sure you are aware that you can name ranges in Excel either by writing in the address field in the top left or using the names manager in the formulas tab. You can use these names to specify ranges. If you define a named range Names
as A2:A7
, you can get the range for your names via Worksheets("Sheet1").Range("Names")
, which makes you independent from the specific design of the source sheet.
Tables
Even better would be to turn the input range into an Excel table, which we will call SourceTable
. Then, you can access it as a ListObject
via WorkSheets("Sheet1").ListObjects.Item("SourceTable")
. Moreover, if you add another row, it will simply expand to also contain the new row. This is a lot more convenient than the approach with names ranges, which have to be adjusted to deal with new rows at the bottom.
You can Save the Sheet You Are Working With in a Variable
@TinMan already suggested using a With
block to hold the worksheet you are working with. However, that will not work well when you have to pass it to some method. Instead, you can simply Set
assigne the worksheet to a varable of type Excel.Worksheet
.
Explicitly Calling Default Members
Default members are members on objects that get used automatically when the object is used in a Let
assignmant, i.e. an assignment without the Set
keyword. They are a source of a lot of surprising behaviour, and bugs. Thus, you should always prefer to call the curresponding member explicitly. For Range
, this means using Range.Value
. In Cells(l + 1, 5) = arr(l)
you actually call Cells(l + 1, 5).Value = arr(l)
, provided arr(l)
contains a value type. If it contained a Range
. the call would translate to Cells(l + 1, 5).Value = arr(l).Value
.
Iterating Collections
Since there is not too much data in this example it is not rally a performance problem, but Collection
s are not designed to be iterated using indized. You should use a For Each
loop instead. Unfortunately, it is not possible to use value types as the type for the item to pick from the loop. However, every object type and Variant
will work.
As New is Usually Not a Good Idea
You actually use the capabilitis of the As New
declaration arr As New Collection
, which is not seen to often. This declaration has the rather surprising effect of implicitly adding If arr Is Nothing Then Set arr = new Collection
in front of every access to arr
.
Although this can be used here to clear the variable by setting it to Nothing
, it would be much clearer to simply set it to New Collection
instead.
Because this behaviour is surprising to most poeaple, I would generally advise against using As New
declarations. Moreover, it hurts performance a bit because of the constant checks against Nothing
.
Declare the Type of All Variables
It is already good to see that you declared the type of nearly all variables. However, you did not explicitly declare a
as a Variant
. This lets is vanish against the other declarations all using an As Type
declaration.
Good Stuff! FYI: I decided to writePivotDataIndexedArray()
as a single subroutine because I felt that passing the ranges to another function might be a little confusing, in conjunction with, my use ofWith Statements
. In retrospect, I probably should have used some temp variables to pass the information and explained the SRP myself.
– TinMan
7 hours ago
add a comment |
up vote
0
down vote
You can try something along these lines where you nest your formula calls:
Range("F2").FormulaArray = _
"=IFERROR(" _
& "INDEX(" _
& "R2C3:R7C3," _
& "MATCH(" _
& "1," _
& "((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C))," _
& "0))," _
& """"")"
But the way you have it currently is not that bad
Also would highly suggest wrapping code into a With
block and adding .
's to your Ranges
and Cells
to protect against errors from bad references.
New contributor
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Fully Qualify Ranges
Your code assumes that the correct worksheet will be active when the code is ran. You should get in the habit of Fully Qualifying all range references. This will ensure that your code will work as intended no matter what worksheet is activated.
Technically, to be fully Fully Qualified a range needs to be referenced by workbook and worksheet.
ThisWorkbook.Worksheets("Sheet1").Range ("A2:C7")
Or
Workbooks("Some Book").Worksheets("Sheet1").Range ("A2:C7")
But in general it is acceptable to exclude the workbook if you are not going to be working with multiple workbooks.
Worksheets("Sheet1").Range ("A2:C7")
With statement blocks should be used so that you don't have to repeatedly requalify your ranges.
With ThisWorkbook.Worksheets("Sheet1")
For Each a in .Range ("A2:C7")
Dynamic Ranges and Relative References
Dynamic Ranges should be used when working with records. In this way, you will not have to rewrite you code every time a recorded is added or deleted.
This applies to both ranges
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each a In .Cells
arr.Add a, a
Next
End With
End With
and formulas and FormulaArray
Range("F2").FormulaArray = "=IFERROR(INDEX(OFFSET($C1,1,0,COUNTA($A:$A)-1,1),MATCH(1,((OFFSET($A1,1,0,COUNTA($A:$A)-1,1)=$E2)*(OFFSET($B1,1,0,COUNTA($A:$A)-1,1)=F$1)),0)),"""")"
Having the macro hard the ranges for the FormulaArray is an acceptable middle ground. I would do this to prevent the formulas from slowing up the workbook.
Error Handling
It is best to reduce the scope of On Error Resume Next
as much as possible. This will give you better information when something goes wrong.
On Error Resume Next
For Each a In var
arr.Add a, a
Next
On Error GoTo 0
Or
For Each a In var
On Error Resume Next
arr.Add a, a
On Error GoTo 0
Next
Variable Naming
Don't not use l
as a variable name. It is too hard to distinguish from 1
.
l = 1
does nothing. The For l = 1
initiates l
to 1
.
arr
should only be used to name arrays variables. Personally, I default to data
, result
or results
.
Although there is nothing wrong with a
, I prefer v
, key
, or item
.
Alternate Methods
Here are some other ways of pivoting the data without using worksheet formulas.
Indexing Array Using ArrayList to Sort Headers
This uses ArrayList
to store and sort the headers. The position of the headers in the ArrayList
is used as indices for a 2 dimensional array.
Sub PivotDataIndexedArray()
Dim key As Variant, data() As Variant
Dim rowHeaders As Object, columnHeaders As Object
Set rowHeaders = CreateObject("System.Collections.ArrayList")
Set columnHeaders = CreateObject("System.Collections.ArrayList")
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each key In .Value
If Not rowHeaders.Contains(key) Then rowHeaders.Add key
Next
For Each key In .Offset(0, 1).Value
If Not columnHeaders.Contains(key) Then columnHeaders.Add key
Next
data = .Cells.Resize(, 3).Value
End With
Dim results() As Variant
Dim n As Long
ReDim results(1 To rowHeaders.Count + 2, 1 To columnHeaders.Count + 2)
'Add names
rowHeaders.Sort
For n = 0 To rowHeaders.Count - 1
results(n + 2, 1) = rowHeaders(n)
Next
'Add categories
columnHeaders.Sort
For n = 0 To columnHeaders.Count - 1
results(1, n + 2) = columnHeaders(n)
Next
Dim r As Long, c As Long
'Add dates
For n = 1 To UBound(data)
r = rowHeaders.IndexOf(data(n, 1), 0)
c = columnHeaders.IndexOf(data(n, 2), 0)
results(r + 2, c + 2) = data(n, 3)
Next
.Range("F1").Resize(UBound(results), UBound(results, 2)).Value = results
End With
End Sub
ADODB CrossTab Query
An ADODB.Connection
query can be used to pivot the data. The source data should be the only data on the worksheet, otherwise, you would have to specify the datas' range on the worksheet in the query (e.g [Sheet1$A1:C7]).
Sub PivotDataADODBQuery()
Const SQL As String = "TRANSFORM First([Date])" & vbNewLine & _
"SELECT [Name]" & vbNewLine & _
"FROM [Sheet1$]" & vbNewLine & _
"GROUP BY [Name]" & vbNewLine & _
"PIVOT [Certificate];"
Dim conn As Object
Dim rs As Object
Set conn = CreateObject("ADODB.Connection")
conn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & ThisWorkbook.FullName & "';Extended Properties='Excel 12.0;HDR=YES;IMEX=1';"
conn.Open
Set rs = conn.Execute(SQL)
With ThisWorkbook.Worksheets("Sheet2")
.Cells.ClearContents
Dim c As Long
For c = 0 To rs.Fields.Count - 1
.Cells(1, c + 1).Value = rs.Fields(c).Name
Next
.Range("A2").CopyFromRecordset rs
End With
rs.Close
conn.Close
End Sub
add a comment |
up vote
2
down vote
accepted
Fully Qualify Ranges
Your code assumes that the correct worksheet will be active when the code is ran. You should get in the habit of Fully Qualifying all range references. This will ensure that your code will work as intended no matter what worksheet is activated.
Technically, to be fully Fully Qualified a range needs to be referenced by workbook and worksheet.
ThisWorkbook.Worksheets("Sheet1").Range ("A2:C7")
Or
Workbooks("Some Book").Worksheets("Sheet1").Range ("A2:C7")
But in general it is acceptable to exclude the workbook if you are not going to be working with multiple workbooks.
Worksheets("Sheet1").Range ("A2:C7")
With statement blocks should be used so that you don't have to repeatedly requalify your ranges.
With ThisWorkbook.Worksheets("Sheet1")
For Each a in .Range ("A2:C7")
Dynamic Ranges and Relative References
Dynamic Ranges should be used when working with records. In this way, you will not have to rewrite you code every time a recorded is added or deleted.
This applies to both ranges
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each a In .Cells
arr.Add a, a
Next
End With
End With
and formulas and FormulaArray
Range("F2").FormulaArray = "=IFERROR(INDEX(OFFSET($C1,1,0,COUNTA($A:$A)-1,1),MATCH(1,((OFFSET($A1,1,0,COUNTA($A:$A)-1,1)=$E2)*(OFFSET($B1,1,0,COUNTA($A:$A)-1,1)=F$1)),0)),"""")"
Having the macro hard the ranges for the FormulaArray is an acceptable middle ground. I would do this to prevent the formulas from slowing up the workbook.
Error Handling
It is best to reduce the scope of On Error Resume Next
as much as possible. This will give you better information when something goes wrong.
On Error Resume Next
For Each a In var
arr.Add a, a
Next
On Error GoTo 0
Or
For Each a In var
On Error Resume Next
arr.Add a, a
On Error GoTo 0
Next
Variable Naming
Don't not use l
as a variable name. It is too hard to distinguish from 1
.
l = 1
does nothing. The For l = 1
initiates l
to 1
.
arr
should only be used to name arrays variables. Personally, I default to data
, result
or results
.
Although there is nothing wrong with a
, I prefer v
, key
, or item
.
Alternate Methods
Here are some other ways of pivoting the data without using worksheet formulas.
Indexing Array Using ArrayList to Sort Headers
This uses ArrayList
to store and sort the headers. The position of the headers in the ArrayList
is used as indices for a 2 dimensional array.
Sub PivotDataIndexedArray()
Dim key As Variant, data() As Variant
Dim rowHeaders As Object, columnHeaders As Object
Set rowHeaders = CreateObject("System.Collections.ArrayList")
Set columnHeaders = CreateObject("System.Collections.ArrayList")
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each key In .Value
If Not rowHeaders.Contains(key) Then rowHeaders.Add key
Next
For Each key In .Offset(0, 1).Value
If Not columnHeaders.Contains(key) Then columnHeaders.Add key
Next
data = .Cells.Resize(, 3).Value
End With
Dim results() As Variant
Dim n As Long
ReDim results(1 To rowHeaders.Count + 2, 1 To columnHeaders.Count + 2)
'Add names
rowHeaders.Sort
For n = 0 To rowHeaders.Count - 1
results(n + 2, 1) = rowHeaders(n)
Next
'Add categories
columnHeaders.Sort
For n = 0 To columnHeaders.Count - 1
results(1, n + 2) = columnHeaders(n)
Next
Dim r As Long, c As Long
'Add dates
For n = 1 To UBound(data)
r = rowHeaders.IndexOf(data(n, 1), 0)
c = columnHeaders.IndexOf(data(n, 2), 0)
results(r + 2, c + 2) = data(n, 3)
Next
.Range("F1").Resize(UBound(results), UBound(results, 2)).Value = results
End With
End Sub
ADODB CrossTab Query
An ADODB.Connection
query can be used to pivot the data. The source data should be the only data on the worksheet, otherwise, you would have to specify the datas' range on the worksheet in the query (e.g [Sheet1$A1:C7]).
Sub PivotDataADODBQuery()
Const SQL As String = "TRANSFORM First([Date])" & vbNewLine & _
"SELECT [Name]" & vbNewLine & _
"FROM [Sheet1$]" & vbNewLine & _
"GROUP BY [Name]" & vbNewLine & _
"PIVOT [Certificate];"
Dim conn As Object
Dim rs As Object
Set conn = CreateObject("ADODB.Connection")
conn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & ThisWorkbook.FullName & "';Extended Properties='Excel 12.0;HDR=YES;IMEX=1';"
conn.Open
Set rs = conn.Execute(SQL)
With ThisWorkbook.Worksheets("Sheet2")
.Cells.ClearContents
Dim c As Long
For c = 0 To rs.Fields.Count - 1
.Cells(1, c + 1).Value = rs.Fields(c).Name
Next
.Range("A2").CopyFromRecordset rs
End With
rs.Close
conn.Close
End Sub
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Fully Qualify Ranges
Your code assumes that the correct worksheet will be active when the code is ran. You should get in the habit of Fully Qualifying all range references. This will ensure that your code will work as intended no matter what worksheet is activated.
Technically, to be fully Fully Qualified a range needs to be referenced by workbook and worksheet.
ThisWorkbook.Worksheets("Sheet1").Range ("A2:C7")
Or
Workbooks("Some Book").Worksheets("Sheet1").Range ("A2:C7")
But in general it is acceptable to exclude the workbook if you are not going to be working with multiple workbooks.
Worksheets("Sheet1").Range ("A2:C7")
With statement blocks should be used so that you don't have to repeatedly requalify your ranges.
With ThisWorkbook.Worksheets("Sheet1")
For Each a in .Range ("A2:C7")
Dynamic Ranges and Relative References
Dynamic Ranges should be used when working with records. In this way, you will not have to rewrite you code every time a recorded is added or deleted.
This applies to both ranges
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each a In .Cells
arr.Add a, a
Next
End With
End With
and formulas and FormulaArray
Range("F2").FormulaArray = "=IFERROR(INDEX(OFFSET($C1,1,0,COUNTA($A:$A)-1,1),MATCH(1,((OFFSET($A1,1,0,COUNTA($A:$A)-1,1)=$E2)*(OFFSET($B1,1,0,COUNTA($A:$A)-1,1)=F$1)),0)),"""")"
Having the macro hard the ranges for the FormulaArray is an acceptable middle ground. I would do this to prevent the formulas from slowing up the workbook.
Error Handling
It is best to reduce the scope of On Error Resume Next
as much as possible. This will give you better information when something goes wrong.
On Error Resume Next
For Each a In var
arr.Add a, a
Next
On Error GoTo 0
Or
For Each a In var
On Error Resume Next
arr.Add a, a
On Error GoTo 0
Next
Variable Naming
Don't not use l
as a variable name. It is too hard to distinguish from 1
.
l = 1
does nothing. The For l = 1
initiates l
to 1
.
arr
should only be used to name arrays variables. Personally, I default to data
, result
or results
.
Although there is nothing wrong with a
, I prefer v
, key
, or item
.
Alternate Methods
Here are some other ways of pivoting the data without using worksheet formulas.
Indexing Array Using ArrayList to Sort Headers
This uses ArrayList
to store and sort the headers. The position of the headers in the ArrayList
is used as indices for a 2 dimensional array.
Sub PivotDataIndexedArray()
Dim key As Variant, data() As Variant
Dim rowHeaders As Object, columnHeaders As Object
Set rowHeaders = CreateObject("System.Collections.ArrayList")
Set columnHeaders = CreateObject("System.Collections.ArrayList")
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each key In .Value
If Not rowHeaders.Contains(key) Then rowHeaders.Add key
Next
For Each key In .Offset(0, 1).Value
If Not columnHeaders.Contains(key) Then columnHeaders.Add key
Next
data = .Cells.Resize(, 3).Value
End With
Dim results() As Variant
Dim n As Long
ReDim results(1 To rowHeaders.Count + 2, 1 To columnHeaders.Count + 2)
'Add names
rowHeaders.Sort
For n = 0 To rowHeaders.Count - 1
results(n + 2, 1) = rowHeaders(n)
Next
'Add categories
columnHeaders.Sort
For n = 0 To columnHeaders.Count - 1
results(1, n + 2) = columnHeaders(n)
Next
Dim r As Long, c As Long
'Add dates
For n = 1 To UBound(data)
r = rowHeaders.IndexOf(data(n, 1), 0)
c = columnHeaders.IndexOf(data(n, 2), 0)
results(r + 2, c + 2) = data(n, 3)
Next
.Range("F1").Resize(UBound(results), UBound(results, 2)).Value = results
End With
End Sub
ADODB CrossTab Query
An ADODB.Connection
query can be used to pivot the data. The source data should be the only data on the worksheet, otherwise, you would have to specify the datas' range on the worksheet in the query (e.g [Sheet1$A1:C7]).
Sub PivotDataADODBQuery()
Const SQL As String = "TRANSFORM First([Date])" & vbNewLine & _
"SELECT [Name]" & vbNewLine & _
"FROM [Sheet1$]" & vbNewLine & _
"GROUP BY [Name]" & vbNewLine & _
"PIVOT [Certificate];"
Dim conn As Object
Dim rs As Object
Set conn = CreateObject("ADODB.Connection")
conn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & ThisWorkbook.FullName & "';Extended Properties='Excel 12.0;HDR=YES;IMEX=1';"
conn.Open
Set rs = conn.Execute(SQL)
With ThisWorkbook.Worksheets("Sheet2")
.Cells.ClearContents
Dim c As Long
For c = 0 To rs.Fields.Count - 1
.Cells(1, c + 1).Value = rs.Fields(c).Name
Next
.Range("A2").CopyFromRecordset rs
End With
rs.Close
conn.Close
End Sub
Fully Qualify Ranges
Your code assumes that the correct worksheet will be active when the code is ran. You should get in the habit of Fully Qualifying all range references. This will ensure that your code will work as intended no matter what worksheet is activated.
Technically, to be fully Fully Qualified a range needs to be referenced by workbook and worksheet.
ThisWorkbook.Worksheets("Sheet1").Range ("A2:C7")
Or
Workbooks("Some Book").Worksheets("Sheet1").Range ("A2:C7")
But in general it is acceptable to exclude the workbook if you are not going to be working with multiple workbooks.
Worksheets("Sheet1").Range ("A2:C7")
With statement blocks should be used so that you don't have to repeatedly requalify your ranges.
With ThisWorkbook.Worksheets("Sheet1")
For Each a in .Range ("A2:C7")
Dynamic Ranges and Relative References
Dynamic Ranges should be used when working with records. In this way, you will not have to rewrite you code every time a recorded is added or deleted.
This applies to both ranges
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each a In .Cells
arr.Add a, a
Next
End With
End With
and formulas and FormulaArray
Range("F2").FormulaArray = "=IFERROR(INDEX(OFFSET($C1,1,0,COUNTA($A:$A)-1,1),MATCH(1,((OFFSET($A1,1,0,COUNTA($A:$A)-1,1)=$E2)*(OFFSET($B1,1,0,COUNTA($A:$A)-1,1)=F$1)),0)),"""")"
Having the macro hard the ranges for the FormulaArray is an acceptable middle ground. I would do this to prevent the formulas from slowing up the workbook.
Error Handling
It is best to reduce the scope of On Error Resume Next
as much as possible. This will give you better information when something goes wrong.
On Error Resume Next
For Each a In var
arr.Add a, a
Next
On Error GoTo 0
Or
For Each a In var
On Error Resume Next
arr.Add a, a
On Error GoTo 0
Next
Variable Naming
Don't not use l
as a variable name. It is too hard to distinguish from 1
.
l = 1
does nothing. The For l = 1
initiates l
to 1
.
arr
should only be used to name arrays variables. Personally, I default to data
, result
or results
.
Although there is nothing wrong with a
, I prefer v
, key
, or item
.
Alternate Methods
Here are some other ways of pivoting the data without using worksheet formulas.
Indexing Array Using ArrayList to Sort Headers
This uses ArrayList
to store and sort the headers. The position of the headers in the ArrayList
is used as indices for a 2 dimensional array.
Sub PivotDataIndexedArray()
Dim key As Variant, data() As Variant
Dim rowHeaders As Object, columnHeaders As Object
Set rowHeaders = CreateObject("System.Collections.ArrayList")
Set columnHeaders = CreateObject("System.Collections.ArrayList")
With ThisWorkbook.Worksheets("Sheet1")
With .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
For Each key In .Value
If Not rowHeaders.Contains(key) Then rowHeaders.Add key
Next
For Each key In .Offset(0, 1).Value
If Not columnHeaders.Contains(key) Then columnHeaders.Add key
Next
data = .Cells.Resize(, 3).Value
End With
Dim results() As Variant
Dim n As Long
ReDim results(1 To rowHeaders.Count + 2, 1 To columnHeaders.Count + 2)
'Add names
rowHeaders.Sort
For n = 0 To rowHeaders.Count - 1
results(n + 2, 1) = rowHeaders(n)
Next
'Add categories
columnHeaders.Sort
For n = 0 To columnHeaders.Count - 1
results(1, n + 2) = columnHeaders(n)
Next
Dim r As Long, c As Long
'Add dates
For n = 1 To UBound(data)
r = rowHeaders.IndexOf(data(n, 1), 0)
c = columnHeaders.IndexOf(data(n, 2), 0)
results(r + 2, c + 2) = data(n, 3)
Next
.Range("F1").Resize(UBound(results), UBound(results, 2)).Value = results
End With
End Sub
ADODB CrossTab Query
An ADODB.Connection
query can be used to pivot the data. The source data should be the only data on the worksheet, otherwise, you would have to specify the datas' range on the worksheet in the query (e.g [Sheet1$A1:C7]).
Sub PivotDataADODBQuery()
Const SQL As String = "TRANSFORM First([Date])" & vbNewLine & _
"SELECT [Name]" & vbNewLine & _
"FROM [Sheet1$]" & vbNewLine & _
"GROUP BY [Name]" & vbNewLine & _
"PIVOT [Certificate];"
Dim conn As Object
Dim rs As Object
Set conn = CreateObject("ADODB.Connection")
conn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & ThisWorkbook.FullName & "';Extended Properties='Excel 12.0;HDR=YES;IMEX=1';"
conn.Open
Set rs = conn.Execute(SQL)
With ThisWorkbook.Worksheets("Sheet2")
.Cells.ClearContents
Dim c As Long
For c = 0 To rs.Fields.Count - 1
.Cells(1, c + 1).Value = rs.Fields(c).Name
Next
.Range("A2").CopyFromRecordset rs
End With
rs.Close
conn.Close
End Sub
answered 13 hours ago
TinMan
91318
91318
add a comment |
add a comment |
up vote
1
down vote
I would like to add a few more pieces of advice to what @TinMan has already provided.
Naming
Good naming is one of the most useful and at the same time hardest things in programming. When you have to come back to your procedure 6 months in the furture, you will thank yourself for using descriptive names. In this case, the procedure is relatively short so that understanding what it does is not too hard, but in more complicated code, good names can make a huge difference.
Because of this, my advice is to go beyond the advise @TinMan has provided and to generally use descriptive names. (It is no problem if they get longer in the process.) E.g. the name rng
does not really tell you anything about what it is; calling it sourceRange
would certainly convey its purpose better.
Single Responsibility Principle
Another good guiding principle is the so called single responsibility principle. In short, it says that a usint of code should always only be responsible for one and only one thing. This makes it a lot easier to understand and modify code.
Getting this right is rather hard. However, there is a one rough guidline that can help: whenever you have the urge to add a header comment, you probably want to extract a procedure or function for whatever is done in the section.
In your case, one responsibility is to know how to extract values from a range. You could define a function to do that as follows.
Private Function DistinctValues(inputRange As Excel.Range) As Collection
Dim allValues() As Variant
allValues = inputRange.Value
Dim uniqueValues As Collection
Set uniqueValues = New Collection
Dim currentValue As Variant
For Each currentValue In allValues
On Error Resume Next
uniqueValues.Add currentValue, currentValue
On Error Goto 0
Next
Set DistinctValues = uniqueValues
End Function
If you ever want to change how to get distinct values, e.g. using a Scripting.Dictionary
, you just have to change it in this one place. Moreover, your code is easier to understand if instead of
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
you have
Dim certificatesRange As Excel.Range
Set certificatesRange = Range("B2:B7")
Set distinctCertificates = DistinctValues(certificatesRange)
You could also separate the actual assembling of the new table into a procedure that takes some base point, e.g. the upper-left corner of the target range, a collection of column headers, one of row headers and the data source range.
With this, let us call it BuildPivotTable
, your top procedure would look something like this:
Public Sub PivotData
Dim sourceRange As Excel.Range
Set sourceRange = Range("A2:C7")
Dim columnHeadersRange As Excel.Range
Set columnHeadersRange = Range("B2:B7")
Dim distinctColumnHeaders As Collection
Set distinctColumnHeaders = DistinctValues(columnHeadersRange)
Dim rowHeadersRange As Excel.Range
Set rowHeadersRange = Range("A2:A7")
Dim distinctRowHeaders As Collection
Set distinctRowHeaders = DistinctValues(rowHeadersRange)
Dim targetRange As Excel.Range
Set targetRange = ActiveSheet.Range("E1")
BuildPivotTable targetRange, distinctColumnHeaders, distinctRowHeaders, sourceRange
End Sub
This separates getting the data from doing things with it. Should you want to change where you get your data, you will no longer have to find the appropriate passages between the code doing things with the data. You can even further enhance this by extracting the passages getting the various ranges into their own functions responsible to know where to get the particular data from.
Again, this is not that cruicial for this size of method, but as things grow larger, which tends to happen rather fast whan adding functionality to things, a good separation of responsibilities can help a lot; it certainly justifies the extra code you have to write to achieve it.
Do not Reuse Variables For Different Things
In your code, you first use the variable arr
for distinct names, then for distinct certificates. This makes it harder to follow what the the collection alrady contains and actively hinders good nameing: you cannot name it after what its purpose is if it has multiple ones.
It really does not cost a lot to generate yet another object. So such micro-optimizations should be avoided in favor of ease of reading the code.
Declare Variables Close to Their First Usage
Somewhat related to the last point, it is usually better to declare variables as close to their first usage as possible. This has the advantage that you cannot accidentaly add an access to the variable before that point (The compiler will yell at you.) and that you can be sure at that point that you have a clean object. E.g. a collection will not already contain something.
I know that this contradicts VBA style guids. However, those have been written in the 90s and which practices are deemed useful has involved in the past two decades.
Note that this guideline is much less relevent in short methods following the single responsibility principle.
Data Input
After the general pieces of advice applicable to basically all programming languages, let me come to some more Excel and VBA specific point.
@TinMan already pointed out to possible enhancemets to make the code work with dynamic ranges of data. I would like to add two options: names ranges and list objects.
Named Ranges
I am sure you are aware that you can name ranges in Excel either by writing in the address field in the top left or using the names manager in the formulas tab. You can use these names to specify ranges. If you define a named range Names
as A2:A7
, you can get the range for your names via Worksheets("Sheet1").Range("Names")
, which makes you independent from the specific design of the source sheet.
Tables
Even better would be to turn the input range into an Excel table, which we will call SourceTable
. Then, you can access it as a ListObject
via WorkSheets("Sheet1").ListObjects.Item("SourceTable")
. Moreover, if you add another row, it will simply expand to also contain the new row. This is a lot more convenient than the approach with names ranges, which have to be adjusted to deal with new rows at the bottom.
You can Save the Sheet You Are Working With in a Variable
@TinMan already suggested using a With
block to hold the worksheet you are working with. However, that will not work well when you have to pass it to some method. Instead, you can simply Set
assigne the worksheet to a varable of type Excel.Worksheet
.
Explicitly Calling Default Members
Default members are members on objects that get used automatically when the object is used in a Let
assignmant, i.e. an assignment without the Set
keyword. They are a source of a lot of surprising behaviour, and bugs. Thus, you should always prefer to call the curresponding member explicitly. For Range
, this means using Range.Value
. In Cells(l + 1, 5) = arr(l)
you actually call Cells(l + 1, 5).Value = arr(l)
, provided arr(l)
contains a value type. If it contained a Range
. the call would translate to Cells(l + 1, 5).Value = arr(l).Value
.
Iterating Collections
Since there is not too much data in this example it is not rally a performance problem, but Collection
s are not designed to be iterated using indized. You should use a For Each
loop instead. Unfortunately, it is not possible to use value types as the type for the item to pick from the loop. However, every object type and Variant
will work.
As New is Usually Not a Good Idea
You actually use the capabilitis of the As New
declaration arr As New Collection
, which is not seen to often. This declaration has the rather surprising effect of implicitly adding If arr Is Nothing Then Set arr = new Collection
in front of every access to arr
.
Although this can be used here to clear the variable by setting it to Nothing
, it would be much clearer to simply set it to New Collection
instead.
Because this behaviour is surprising to most poeaple, I would generally advise against using As New
declarations. Moreover, it hurts performance a bit because of the constant checks against Nothing
.
Declare the Type of All Variables
It is already good to see that you declared the type of nearly all variables. However, you did not explicitly declare a
as a Variant
. This lets is vanish against the other declarations all using an As Type
declaration.
Good Stuff! FYI: I decided to writePivotDataIndexedArray()
as a single subroutine because I felt that passing the ranges to another function might be a little confusing, in conjunction with, my use ofWith Statements
. In retrospect, I probably should have used some temp variables to pass the information and explained the SRP myself.
– TinMan
7 hours ago
add a comment |
up vote
1
down vote
I would like to add a few more pieces of advice to what @TinMan has already provided.
Naming
Good naming is one of the most useful and at the same time hardest things in programming. When you have to come back to your procedure 6 months in the furture, you will thank yourself for using descriptive names. In this case, the procedure is relatively short so that understanding what it does is not too hard, but in more complicated code, good names can make a huge difference.
Because of this, my advice is to go beyond the advise @TinMan has provided and to generally use descriptive names. (It is no problem if they get longer in the process.) E.g. the name rng
does not really tell you anything about what it is; calling it sourceRange
would certainly convey its purpose better.
Single Responsibility Principle
Another good guiding principle is the so called single responsibility principle. In short, it says that a usint of code should always only be responsible for one and only one thing. This makes it a lot easier to understand and modify code.
Getting this right is rather hard. However, there is a one rough guidline that can help: whenever you have the urge to add a header comment, you probably want to extract a procedure or function for whatever is done in the section.
In your case, one responsibility is to know how to extract values from a range. You could define a function to do that as follows.
Private Function DistinctValues(inputRange As Excel.Range) As Collection
Dim allValues() As Variant
allValues = inputRange.Value
Dim uniqueValues As Collection
Set uniqueValues = New Collection
Dim currentValue As Variant
For Each currentValue In allValues
On Error Resume Next
uniqueValues.Add currentValue, currentValue
On Error Goto 0
Next
Set DistinctValues = uniqueValues
End Function
If you ever want to change how to get distinct values, e.g. using a Scripting.Dictionary
, you just have to change it in this one place. Moreover, your code is easier to understand if instead of
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
you have
Dim certificatesRange As Excel.Range
Set certificatesRange = Range("B2:B7")
Set distinctCertificates = DistinctValues(certificatesRange)
You could also separate the actual assembling of the new table into a procedure that takes some base point, e.g. the upper-left corner of the target range, a collection of column headers, one of row headers and the data source range.
With this, let us call it BuildPivotTable
, your top procedure would look something like this:
Public Sub PivotData
Dim sourceRange As Excel.Range
Set sourceRange = Range("A2:C7")
Dim columnHeadersRange As Excel.Range
Set columnHeadersRange = Range("B2:B7")
Dim distinctColumnHeaders As Collection
Set distinctColumnHeaders = DistinctValues(columnHeadersRange)
Dim rowHeadersRange As Excel.Range
Set rowHeadersRange = Range("A2:A7")
Dim distinctRowHeaders As Collection
Set distinctRowHeaders = DistinctValues(rowHeadersRange)
Dim targetRange As Excel.Range
Set targetRange = ActiveSheet.Range("E1")
BuildPivotTable targetRange, distinctColumnHeaders, distinctRowHeaders, sourceRange
End Sub
This separates getting the data from doing things with it. Should you want to change where you get your data, you will no longer have to find the appropriate passages between the code doing things with the data. You can even further enhance this by extracting the passages getting the various ranges into their own functions responsible to know where to get the particular data from.
Again, this is not that cruicial for this size of method, but as things grow larger, which tends to happen rather fast whan adding functionality to things, a good separation of responsibilities can help a lot; it certainly justifies the extra code you have to write to achieve it.
Do not Reuse Variables For Different Things
In your code, you first use the variable arr
for distinct names, then for distinct certificates. This makes it harder to follow what the the collection alrady contains and actively hinders good nameing: you cannot name it after what its purpose is if it has multiple ones.
It really does not cost a lot to generate yet another object. So such micro-optimizations should be avoided in favor of ease of reading the code.
Declare Variables Close to Their First Usage
Somewhat related to the last point, it is usually better to declare variables as close to their first usage as possible. This has the advantage that you cannot accidentaly add an access to the variable before that point (The compiler will yell at you.) and that you can be sure at that point that you have a clean object. E.g. a collection will not already contain something.
I know that this contradicts VBA style guids. However, those have been written in the 90s and which practices are deemed useful has involved in the past two decades.
Note that this guideline is much less relevent in short methods following the single responsibility principle.
Data Input
After the general pieces of advice applicable to basically all programming languages, let me come to some more Excel and VBA specific point.
@TinMan already pointed out to possible enhancemets to make the code work with dynamic ranges of data. I would like to add two options: names ranges and list objects.
Named Ranges
I am sure you are aware that you can name ranges in Excel either by writing in the address field in the top left or using the names manager in the formulas tab. You can use these names to specify ranges. If you define a named range Names
as A2:A7
, you can get the range for your names via Worksheets("Sheet1").Range("Names")
, which makes you independent from the specific design of the source sheet.
Tables
Even better would be to turn the input range into an Excel table, which we will call SourceTable
. Then, you can access it as a ListObject
via WorkSheets("Sheet1").ListObjects.Item("SourceTable")
. Moreover, if you add another row, it will simply expand to also contain the new row. This is a lot more convenient than the approach with names ranges, which have to be adjusted to deal with new rows at the bottom.
You can Save the Sheet You Are Working With in a Variable
@TinMan already suggested using a With
block to hold the worksheet you are working with. However, that will not work well when you have to pass it to some method. Instead, you can simply Set
assigne the worksheet to a varable of type Excel.Worksheet
.
Explicitly Calling Default Members
Default members are members on objects that get used automatically when the object is used in a Let
assignmant, i.e. an assignment without the Set
keyword. They are a source of a lot of surprising behaviour, and bugs. Thus, you should always prefer to call the curresponding member explicitly. For Range
, this means using Range.Value
. In Cells(l + 1, 5) = arr(l)
you actually call Cells(l + 1, 5).Value = arr(l)
, provided arr(l)
contains a value type. If it contained a Range
. the call would translate to Cells(l + 1, 5).Value = arr(l).Value
.
Iterating Collections
Since there is not too much data in this example it is not rally a performance problem, but Collection
s are not designed to be iterated using indized. You should use a For Each
loop instead. Unfortunately, it is not possible to use value types as the type for the item to pick from the loop. However, every object type and Variant
will work.
As New is Usually Not a Good Idea
You actually use the capabilitis of the As New
declaration arr As New Collection
, which is not seen to often. This declaration has the rather surprising effect of implicitly adding If arr Is Nothing Then Set arr = new Collection
in front of every access to arr
.
Although this can be used here to clear the variable by setting it to Nothing
, it would be much clearer to simply set it to New Collection
instead.
Because this behaviour is surprising to most poeaple, I would generally advise against using As New
declarations. Moreover, it hurts performance a bit because of the constant checks against Nothing
.
Declare the Type of All Variables
It is already good to see that you declared the type of nearly all variables. However, you did not explicitly declare a
as a Variant
. This lets is vanish against the other declarations all using an As Type
declaration.
Good Stuff! FYI: I decided to writePivotDataIndexedArray()
as a single subroutine because I felt that passing the ranges to another function might be a little confusing, in conjunction with, my use ofWith Statements
. In retrospect, I probably should have used some temp variables to pass the information and explained the SRP myself.
– TinMan
7 hours ago
add a comment |
up vote
1
down vote
up vote
1
down vote
I would like to add a few more pieces of advice to what @TinMan has already provided.
Naming
Good naming is one of the most useful and at the same time hardest things in programming. When you have to come back to your procedure 6 months in the furture, you will thank yourself for using descriptive names. In this case, the procedure is relatively short so that understanding what it does is not too hard, but in more complicated code, good names can make a huge difference.
Because of this, my advice is to go beyond the advise @TinMan has provided and to generally use descriptive names. (It is no problem if they get longer in the process.) E.g. the name rng
does not really tell you anything about what it is; calling it sourceRange
would certainly convey its purpose better.
Single Responsibility Principle
Another good guiding principle is the so called single responsibility principle. In short, it says that a usint of code should always only be responsible for one and only one thing. This makes it a lot easier to understand and modify code.
Getting this right is rather hard. However, there is a one rough guidline that can help: whenever you have the urge to add a header comment, you probably want to extract a procedure or function for whatever is done in the section.
In your case, one responsibility is to know how to extract values from a range. You could define a function to do that as follows.
Private Function DistinctValues(inputRange As Excel.Range) As Collection
Dim allValues() As Variant
allValues = inputRange.Value
Dim uniqueValues As Collection
Set uniqueValues = New Collection
Dim currentValue As Variant
For Each currentValue In allValues
On Error Resume Next
uniqueValues.Add currentValue, currentValue
On Error Goto 0
Next
Set DistinctValues = uniqueValues
End Function
If you ever want to change how to get distinct values, e.g. using a Scripting.Dictionary
, you just have to change it in this one place. Moreover, your code is easier to understand if instead of
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
you have
Dim certificatesRange As Excel.Range
Set certificatesRange = Range("B2:B7")
Set distinctCertificates = DistinctValues(certificatesRange)
You could also separate the actual assembling of the new table into a procedure that takes some base point, e.g. the upper-left corner of the target range, a collection of column headers, one of row headers and the data source range.
With this, let us call it BuildPivotTable
, your top procedure would look something like this:
Public Sub PivotData
Dim sourceRange As Excel.Range
Set sourceRange = Range("A2:C7")
Dim columnHeadersRange As Excel.Range
Set columnHeadersRange = Range("B2:B7")
Dim distinctColumnHeaders As Collection
Set distinctColumnHeaders = DistinctValues(columnHeadersRange)
Dim rowHeadersRange As Excel.Range
Set rowHeadersRange = Range("A2:A7")
Dim distinctRowHeaders As Collection
Set distinctRowHeaders = DistinctValues(rowHeadersRange)
Dim targetRange As Excel.Range
Set targetRange = ActiveSheet.Range("E1")
BuildPivotTable targetRange, distinctColumnHeaders, distinctRowHeaders, sourceRange
End Sub
This separates getting the data from doing things with it. Should you want to change where you get your data, you will no longer have to find the appropriate passages between the code doing things with the data. You can even further enhance this by extracting the passages getting the various ranges into their own functions responsible to know where to get the particular data from.
Again, this is not that cruicial for this size of method, but as things grow larger, which tends to happen rather fast whan adding functionality to things, a good separation of responsibilities can help a lot; it certainly justifies the extra code you have to write to achieve it.
Do not Reuse Variables For Different Things
In your code, you first use the variable arr
for distinct names, then for distinct certificates. This makes it harder to follow what the the collection alrady contains and actively hinders good nameing: you cannot name it after what its purpose is if it has multiple ones.
It really does not cost a lot to generate yet another object. So such micro-optimizations should be avoided in favor of ease of reading the code.
Declare Variables Close to Their First Usage
Somewhat related to the last point, it is usually better to declare variables as close to their first usage as possible. This has the advantage that you cannot accidentaly add an access to the variable before that point (The compiler will yell at you.) and that you can be sure at that point that you have a clean object. E.g. a collection will not already contain something.
I know that this contradicts VBA style guids. However, those have been written in the 90s and which practices are deemed useful has involved in the past two decades.
Note that this guideline is much less relevent in short methods following the single responsibility principle.
Data Input
After the general pieces of advice applicable to basically all programming languages, let me come to some more Excel and VBA specific point.
@TinMan already pointed out to possible enhancemets to make the code work with dynamic ranges of data. I would like to add two options: names ranges and list objects.
Named Ranges
I am sure you are aware that you can name ranges in Excel either by writing in the address field in the top left or using the names manager in the formulas tab. You can use these names to specify ranges. If you define a named range Names
as A2:A7
, you can get the range for your names via Worksheets("Sheet1").Range("Names")
, which makes you independent from the specific design of the source sheet.
Tables
Even better would be to turn the input range into an Excel table, which we will call SourceTable
. Then, you can access it as a ListObject
via WorkSheets("Sheet1").ListObjects.Item("SourceTable")
. Moreover, if you add another row, it will simply expand to also contain the new row. This is a lot more convenient than the approach with names ranges, which have to be adjusted to deal with new rows at the bottom.
You can Save the Sheet You Are Working With in a Variable
@TinMan already suggested using a With
block to hold the worksheet you are working with. However, that will not work well when you have to pass it to some method. Instead, you can simply Set
assigne the worksheet to a varable of type Excel.Worksheet
.
Explicitly Calling Default Members
Default members are members on objects that get used automatically when the object is used in a Let
assignmant, i.e. an assignment without the Set
keyword. They are a source of a lot of surprising behaviour, and bugs. Thus, you should always prefer to call the curresponding member explicitly. For Range
, this means using Range.Value
. In Cells(l + 1, 5) = arr(l)
you actually call Cells(l + 1, 5).Value = arr(l)
, provided arr(l)
contains a value type. If it contained a Range
. the call would translate to Cells(l + 1, 5).Value = arr(l).Value
.
Iterating Collections
Since there is not too much data in this example it is not rally a performance problem, but Collection
s are not designed to be iterated using indized. You should use a For Each
loop instead. Unfortunately, it is not possible to use value types as the type for the item to pick from the loop. However, every object type and Variant
will work.
As New is Usually Not a Good Idea
You actually use the capabilitis of the As New
declaration arr As New Collection
, which is not seen to often. This declaration has the rather surprising effect of implicitly adding If arr Is Nothing Then Set arr = new Collection
in front of every access to arr
.
Although this can be used here to clear the variable by setting it to Nothing
, it would be much clearer to simply set it to New Collection
instead.
Because this behaviour is surprising to most poeaple, I would generally advise against using As New
declarations. Moreover, it hurts performance a bit because of the constant checks against Nothing
.
Declare the Type of All Variables
It is already good to see that you declared the type of nearly all variables. However, you did not explicitly declare a
as a Variant
. This lets is vanish against the other declarations all using an As Type
declaration.
I would like to add a few more pieces of advice to what @TinMan has already provided.
Naming
Good naming is one of the most useful and at the same time hardest things in programming. When you have to come back to your procedure 6 months in the furture, you will thank yourself for using descriptive names. In this case, the procedure is relatively short so that understanding what it does is not too hard, but in more complicated code, good names can make a huge difference.
Because of this, my advice is to go beyond the advise @TinMan has provided and to generally use descriptive names. (It is no problem if they get longer in the process.) E.g. the name rng
does not really tell you anything about what it is; calling it sourceRange
would certainly convey its purpose better.
Single Responsibility Principle
Another good guiding principle is the so called single responsibility principle. In short, it says that a usint of code should always only be responsible for one and only one thing. This makes it a lot easier to understand and modify code.
Getting this right is rather hard. However, there is a one rough guidline that can help: whenever you have the urge to add a header comment, you probably want to extract a procedure or function for whatever is done in the section.
In your case, one responsibility is to know how to extract values from a range. You could define a function to do that as follows.
Private Function DistinctValues(inputRange As Excel.Range) As Collection
Dim allValues() As Variant
allValues = inputRange.Value
Dim uniqueValues As Collection
Set uniqueValues = New Collection
Dim currentValue As Variant
For Each currentValue In allValues
On Error Resume Next
uniqueValues.Add currentValue, currentValue
On Error Goto 0
Next
Set DistinctValues = uniqueValues
End Function
If you ever want to change how to get distinct values, e.g. using a Scripting.Dictionary
, you just have to change it in this one place. Moreover, your code is easier to understand if instead of
var = Range("B2:B7")
For Each a In var
arr.Add a, a
Next
you have
Dim certificatesRange As Excel.Range
Set certificatesRange = Range("B2:B7")
Set distinctCertificates = DistinctValues(certificatesRange)
You could also separate the actual assembling of the new table into a procedure that takes some base point, e.g. the upper-left corner of the target range, a collection of column headers, one of row headers and the data source range.
With this, let us call it BuildPivotTable
, your top procedure would look something like this:
Public Sub PivotData
Dim sourceRange As Excel.Range
Set sourceRange = Range("A2:C7")
Dim columnHeadersRange As Excel.Range
Set columnHeadersRange = Range("B2:B7")
Dim distinctColumnHeaders As Collection
Set distinctColumnHeaders = DistinctValues(columnHeadersRange)
Dim rowHeadersRange As Excel.Range
Set rowHeadersRange = Range("A2:A7")
Dim distinctRowHeaders As Collection
Set distinctRowHeaders = DistinctValues(rowHeadersRange)
Dim targetRange As Excel.Range
Set targetRange = ActiveSheet.Range("E1")
BuildPivotTable targetRange, distinctColumnHeaders, distinctRowHeaders, sourceRange
End Sub
This separates getting the data from doing things with it. Should you want to change where you get your data, you will no longer have to find the appropriate passages between the code doing things with the data. You can even further enhance this by extracting the passages getting the various ranges into their own functions responsible to know where to get the particular data from.
Again, this is not that cruicial for this size of method, but as things grow larger, which tends to happen rather fast whan adding functionality to things, a good separation of responsibilities can help a lot; it certainly justifies the extra code you have to write to achieve it.
Do not Reuse Variables For Different Things
In your code, you first use the variable arr
for distinct names, then for distinct certificates. This makes it harder to follow what the the collection alrady contains and actively hinders good nameing: you cannot name it after what its purpose is if it has multiple ones.
It really does not cost a lot to generate yet another object. So such micro-optimizations should be avoided in favor of ease of reading the code.
Declare Variables Close to Their First Usage
Somewhat related to the last point, it is usually better to declare variables as close to their first usage as possible. This has the advantage that you cannot accidentaly add an access to the variable before that point (The compiler will yell at you.) and that you can be sure at that point that you have a clean object. E.g. a collection will not already contain something.
I know that this contradicts VBA style guids. However, those have been written in the 90s and which practices are deemed useful has involved in the past two decades.
Note that this guideline is much less relevent in short methods following the single responsibility principle.
Data Input
After the general pieces of advice applicable to basically all programming languages, let me come to some more Excel and VBA specific point.
@TinMan already pointed out to possible enhancemets to make the code work with dynamic ranges of data. I would like to add two options: names ranges and list objects.
Named Ranges
I am sure you are aware that you can name ranges in Excel either by writing in the address field in the top left or using the names manager in the formulas tab. You can use these names to specify ranges. If you define a named range Names
as A2:A7
, you can get the range for your names via Worksheets("Sheet1").Range("Names")
, which makes you independent from the specific design of the source sheet.
Tables
Even better would be to turn the input range into an Excel table, which we will call SourceTable
. Then, you can access it as a ListObject
via WorkSheets("Sheet1").ListObjects.Item("SourceTable")
. Moreover, if you add another row, it will simply expand to also contain the new row. This is a lot more convenient than the approach with names ranges, which have to be adjusted to deal with new rows at the bottom.
You can Save the Sheet You Are Working With in a Variable
@TinMan already suggested using a With
block to hold the worksheet you are working with. However, that will not work well when you have to pass it to some method. Instead, you can simply Set
assigne the worksheet to a varable of type Excel.Worksheet
.
Explicitly Calling Default Members
Default members are members on objects that get used automatically when the object is used in a Let
assignmant, i.e. an assignment without the Set
keyword. They are a source of a lot of surprising behaviour, and bugs. Thus, you should always prefer to call the curresponding member explicitly. For Range
, this means using Range.Value
. In Cells(l + 1, 5) = arr(l)
you actually call Cells(l + 1, 5).Value = arr(l)
, provided arr(l)
contains a value type. If it contained a Range
. the call would translate to Cells(l + 1, 5).Value = arr(l).Value
.
Iterating Collections
Since there is not too much data in this example it is not rally a performance problem, but Collection
s are not designed to be iterated using indized. You should use a For Each
loop instead. Unfortunately, it is not possible to use value types as the type for the item to pick from the loop. However, every object type and Variant
will work.
As New is Usually Not a Good Idea
You actually use the capabilitis of the As New
declaration arr As New Collection
, which is not seen to often. This declaration has the rather surprising effect of implicitly adding If arr Is Nothing Then Set arr = new Collection
in front of every access to arr
.
Although this can be used here to clear the variable by setting it to Nothing
, it would be much clearer to simply set it to New Collection
instead.
Because this behaviour is surprising to most poeaple, I would generally advise against using As New
declarations. Moreover, it hurts performance a bit because of the constant checks against Nothing
.
Declare the Type of All Variables
It is already good to see that you declared the type of nearly all variables. However, you did not explicitly declare a
as a Variant
. This lets is vanish against the other declarations all using an As Type
declaration.
answered 8 hours ago
M.Doerner
97638
97638
Good Stuff! FYI: I decided to writePivotDataIndexedArray()
as a single subroutine because I felt that passing the ranges to another function might be a little confusing, in conjunction with, my use ofWith Statements
. In retrospect, I probably should have used some temp variables to pass the information and explained the SRP myself.
– TinMan
7 hours ago
add a comment |
Good Stuff! FYI: I decided to writePivotDataIndexedArray()
as a single subroutine because I felt that passing the ranges to another function might be a little confusing, in conjunction with, my use ofWith Statements
. In retrospect, I probably should have used some temp variables to pass the information and explained the SRP myself.
– TinMan
7 hours ago
Good Stuff! FYI: I decided to write
PivotDataIndexedArray()
as a single subroutine because I felt that passing the ranges to another function might be a little confusing, in conjunction with, my use of With Statements
. In retrospect, I probably should have used some temp variables to pass the information and explained the SRP myself.– TinMan
7 hours ago
Good Stuff! FYI: I decided to write
PivotDataIndexedArray()
as a single subroutine because I felt that passing the ranges to another function might be a little confusing, in conjunction with, my use of With Statements
. In retrospect, I probably should have used some temp variables to pass the information and explained the SRP myself.– TinMan
7 hours ago
add a comment |
up vote
0
down vote
You can try something along these lines where you nest your formula calls:
Range("F2").FormulaArray = _
"=IFERROR(" _
& "INDEX(" _
& "R2C3:R7C3," _
& "MATCH(" _
& "1," _
& "((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C))," _
& "0))," _
& """"")"
But the way you have it currently is not that bad
Also would highly suggest wrapping code into a With
block and adding .
's to your Ranges
and Cells
to protect against errors from bad references.
New contributor
add a comment |
up vote
0
down vote
You can try something along these lines where you nest your formula calls:
Range("F2").FormulaArray = _
"=IFERROR(" _
& "INDEX(" _
& "R2C3:R7C3," _
& "MATCH(" _
& "1," _
& "((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C))," _
& "0))," _
& """"")"
But the way you have it currently is not that bad
Also would highly suggest wrapping code into a With
block and adding .
's to your Ranges
and Cells
to protect against errors from bad references.
New contributor
add a comment |
up vote
0
down vote
up vote
0
down vote
You can try something along these lines where you nest your formula calls:
Range("F2").FormulaArray = _
"=IFERROR(" _
& "INDEX(" _
& "R2C3:R7C3," _
& "MATCH(" _
& "1," _
& "((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C))," _
& "0))," _
& """"")"
But the way you have it currently is not that bad
Also would highly suggest wrapping code into a With
block and adding .
's to your Ranges
and Cells
to protect against errors from bad references.
New contributor
You can try something along these lines where you nest your formula calls:
Range("F2").FormulaArray = _
"=IFERROR(" _
& "INDEX(" _
& "R2C3:R7C3," _
& "MATCH(" _
& "1," _
& "((R2C1:R7C1=RC5)*(R2C2:R7C2=R1C))," _
& "0))," _
& """"")"
But the way you have it currently is not that bad
Also would highly suggest wrapping code into a With
block and adding .
's to your Ranges
and Cells
to protect against errors from bad references.
New contributor
New contributor
answered 16 hours ago
Kubie
1113
1113
New contributor
New contributor
add a comment |
add a comment |
Michal Rosa is a new contributor. Be nice, and check out our Code of Conduct.
Michal Rosa is a new contributor. Be nice, and check out our Code of Conduct.
Michal Rosa is a new contributor. Be nice, and check out our Code of Conduct.
Michal Rosa is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207946%2fpivoting-data-with-vba%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown