Histogram from a spreadsheet pivot table
To do
- Add figure legends
This page has 18 graphics and may take some time to load.
Pivot tables are great spreadsheet tool. Previously, I provided generic instructions for making pivot tables with Google Sheets (see Spreadsheet pivot tables). Here, a specific, worked example — protein lengths predicted for a DNA sequence — culminating in a histogram.
Google Sheets
Example for importing a text file (e.g., output from UGENE), managing the data, creating the pivot table, making the histogram
Import the text file (make sure file ends with .txt)
Start a new sheet, select File –> Import, select upload
Figure 1.
Find and load file from your computer. Note I renamed the file I wanted to include a .txt file extension
Figure 2.
Select Replace spreadsheet, then click Import data button
Figure 3.
Data now imported
Figure 4.
Transform the data set to get the data we want
I deleted the first two rows, which in this case, do not include anything I need. I’ll leave the row that contains the “FEATURES Location/Qualifiers” as the header row.
In this example I’m working with data exported from ORF finder UGENE. I want to generate a histogram of the protein lengths predicted for a DNA sequence. Thus, all I want from the output is data labeled “/protein_len.”
Next, proceed to manipulate and extract the data you need. I’ll introduce you to “filters”, a nice feature to help manage the data. Click on the header row, then select Data –> Create a filter. Note the inverted stack in our header row.
Figure 5.
Since I only want /protein_len, set the filter by condition, Text contains and enter the filter condition “/protein” without the quotes
Figure 6.
The result of filtering, only the rows with /protein_len will appear.
Next, I want to extract the numbers (length of the proteins) from the text. I’ll use a series of simple functions in series to extract and retrieve the numbers
value(A4) converts text in the cell A4 to number
right() extracts text from the right end of the cell value
len() determines the length of the string
find() will search the string for a character, then return the position of that character
In total, the command is
=value(right(a4,len(a4)-find("=",a4))))
Figure 7.
Extend the function to the rest of the data,
Create the pivot table
Highlight the column
Figure 8.
Select Data –> Pivot table
Figure 9.
Select options, recommend creating a new sheet
Figure 10.
The basic pilot sheet is created.
Figure 11.
Click to Add rows, our Column B
Figure 12.
Next, we’re going to set up groups so that we can group the results into categories. Right click on any number in the pivot table, then select Create Pivot group rule
Figure 13.
I entered 20 for the minimum value, 400 for the maximum value, then 20 as the interval size. These values were selected because of the range of values in my protein length data set, yours may be different.
Figure 14.
We’re almost there. Our pivot dataset now has the values grouped. All we need now are the counts. In the pivot table edit menu select Values, and Add Column B.
Figure 15.
The default is SUM, select COUNT from the drop down list.
Figure 16.
Out frequency table is now completed.
Figure 17.
Make your histogram by selecting the two columns, then Insert chart
Figure 18.
Export the image by first selecting the chart, clicking on the three dot ellipse (upper right of chart image, see Fig 18), then from the popup menu, select Download chart > PNG image (or PDF if you prefer).
/MD