Histogram from a spreadsheet pivot table

To do

  • Add figure legends

This page has 18 graphics and may take some time to load.

Pivot tables are great spreadsheet tool. Previously, I provided generic instructions for making pivot tables with Google Sheets (see Spreadsheet pivot tables). Here, a specific, worked example — protein lengths predicted for a DNA sequence — culminating in a histogram.

Google Sheets

Example for importing a text file (e.g., output from UGENE), managing the data, creating the pivot table, making the histogram

Import the text file (make sure file ends with .txt)

Start a new sheet, select File –> Import, select upload

Screenshot Google import file, upload

Figure 1.

Find and load file from your computer. Note I renamed the file I wanted to include a .txt file extension

Screenshot file browser

Figure 2.

Select Replace spreadsheet, then click Import data button

Screenshot import options

Figure 3.

Data now imported

Screenshot of import results

Figure 4.

Transform the data set to get the data we want

I deleted the first two rows, which in this case, do not include anything I need. I’ll leave the row that contains the “FEATURES    Location/Qualifiers” as the header row.

In this example I’m working with data exported from ORF finder UGENE. I want to generate a histogram of the protein lengths predicted for a DNA sequence. Thus, all I want from the output is data labeled “/protein_len.”

Next, proceed to manipulate and extract the data you need. I’ll introduce you to “filters”, a nice feature to help manage the data. Click on the header row, then select Data –> Create a filter. Note the inverted stack in our header row.

sheets_filter_01.png

Figure 5.

Since I only want /protein_len, set the filter by condition, Text contains and enter the filter condition “/protein” without the quotes

sheets_filter_02.png

Figure 6.

The result of filtering, only the rows with /protein_len will appear.

Next, I want to extract the numbers (length of the proteins) from the text. I’ll use a series of simple functions in series to extract and retrieve the numbers

value(A4) converts text in the cell A4 to number

right() extracts text from the right end of the cell value

len() determines the length of the string

find() will search the string for a character, then return the position of that character

In total, the command is

=value(right(a4,len(a4)-find("=",a4))))

sheets_filter_03.png

Figure 7.

Extend the function to the rest of the data,

Create the pivot table

Highlight the column

sheets_pivot_02.png

Figure 8.

Select Data –> Pivot table

sheets_pivot_03.png

Figure 9.

Select options, recommend creating a new sheet

sheets_pivot_04.png

Figure 10.

The basic pilot sheet is created.

sheets_pivot_05.png

Figure 11.

Click to Add rows, our Column B

sheets_pivot_06.png

Figure 12.

Next, we’re going to set up groups so that we can group the results into categories. Right click on any number in the pivot table, then select Create Pivot group rule

sheets_pivot_07.png

Figure 13.

I entered 20 for the minimum value, 400 for the maximum value, then 20 as the interval size. These values were selected because of the range of values in my protein length data set, yours may be different.

sheets_pivot_09.png

Figure 14.

We’re almost there. Our pivot dataset now has the values grouped. All we need now are the counts. In the pivot table edit menu select Values, and Add Column B.

sheets_pivot_10.png

Figure 15.

The default is SUM, select COUNT from the drop down list.

sheets_pivot_11.png

Figure 16.

Out frequency table is now completed.

sheets_pivot_12.png

Figure 17.

Make your histogram by selecting the two columns, then Insert chart

sheets_pivot_13.png

Figure 18.

Export the image by first selecting the chart, clicking on the three dot ellipse (upper right of chart image, see Fig 18), then from the popup menu, select Download chart > PNG image (or PDF if you prefer).

/MD