How to resize and convert image files
Basic skills for data science students include need to be able to
- Create and use informative file names
- Resize image files
- Convert from one image format to another
After a brief history, this page includes help with accomplishing these three tasks.
Background
If, like me, you started with mainframes and early personal computers in the early 1980s, then you learned about file size limits right along with use of apps (we called it software), the limits on file names, and many of the other rules one needed to be aware of to navigate those early days. In those days, disk storage was measured in thousands of bytes, not the gigabytes and terabytes of today. My first personal computer, a Zenith Z-158, came with a 20MB hard drive, and cost about $2500 in 1986 (today’s dollars about $6000), and did not include a monitor. By contrast, my new HP All-in-one desktop came with a terabyte hard drive and, of course, the HD monitor at a cost of less than $800 (about $300 in the early 1980s). Sorry for the memory lane reminiscence.
And yet, the spirit of rules established in the early days of personal computing are still present today, but generally hidden from the majority of computer users today. I’ll just comment, hiding all of this is a significant improvement in user experience — it’s just that at times we need to know and manipulate files on our computers.
Think of the file size of a 500-word document (see random text below). Modern Server disk storage is limiting. Bandwidth is limiting (and we pay for both, actually). For example, our university uses CANVAS as its Content Learning Management System. Students are allotted a total of 50 Mb storage — for all classes — and, files carry over from previous semesters and courses. 50 Mb storage is trivial amount today — images taken on Android or iPhones now are 5 – 8 Mb, meaning fewer than a dozen images can be saved to a CANVAS storage folder in a student account. While CANVAS distinguishes between files attached as part of assignments as opposed to general files uploaded by students, rather quickly, students may find they run out of storage space.
We’ll return to file size in a moment. The relevant data science learning objective from this page is that material described on this page all should be part of the initial meetings used to describe how the data science project will be managed. While the first meetings will cover the big picture: define the problem, identify and characterize a solution, and outline an approach, we will need to define in advance, and agree to, several necessary identification features (meta data, variable definitions) and a file naming scheme as part of the data management tasks.
First up, file names.
File names
The rules for file names follow from how the operating system mapped names to storage locations. On MS-DOS systems, filenames were restricted to eight characters plus a three character file extension separated by a single period. We were taught to create informative file name within those character limits. In contrast, the first Macintosh computers allowed file names up to 255 characters long, similar to UNIX (and modern LINUX systems). However, the length included the pathname, too, so the actual filename in practice would be much shorter.
Today, modern computer systems allow use of long filenames and very large files. That does not mean we’re better off.
For example, macOS default filename for a screen capture looks like
Screen Shot 2022-03-11 at 1.58.05 PM.png
Note the spaces (five), dashes (two), and multiple periods (three). The three letter file extension following the period — early used to distinguish between file types — is retained today as a legacy to older systems, but is still used by your operating system to assign file by type to apps (png in this example). Because modern systems open files by “clicking,” the file extension remains the systems first level attempt to open the file with the correct application. Thus, filenames with more than one period should be avoided; it is possible that the OS will confuse periods in the filename with the cutoff between the name of the file and the file extension.
Note how uninformative the file name is, despite its 35-character length. “Screen shot” of what? And, it’s likely redundant to include the date and time in the filename; date and time are associated with the file by the operating system. See Please create good file names for more details about proper naming of files.
File formats
It is a routine request by instructors — submit files of a specific format. Generally, we’ll try to accommodate the file types students are likely to use in default mode with apps, but this is not always the case. Those glorious HEIC files your iPhone takes since iOS 11? Or Webp format, developed by Google to replace older image formats like jpg and png? CANVAS can’t read them (png and jpg only).
Now, before the reader remarks, “the instructor can just convert using…” that’s missing the point. In data science, we share files with various team members as a matter of course during a project. If each team member needs to convert to a different format, this risks loss of information and, it’s a waste of time spent on an activity that does not directly contribute to moving the project forward.
Thus, before submitting images to CANVAS students will need to convert them to an acceptable format. This can be accomplished on the phone directly (e.g., if you send image file attached to text or email, iOS automatically converts from HEIC to jpg), or you can change the default image file type in your camera settings. macOS Preview works with HEIC and can be used to export to jpg format. Android phones (Samsung, Google) still use jpg as default, so file conversion is not an issue.
But size of the file will be an issue, regardless if it is a camera by Apple or Android vendor.
What about text documents? Consider a 579 word document (I used a plain text random text generator). Take a look at the file size range by application on a WinPC computer:
Extension | application | size, in kB |
txt | Notepad | 4 |
doc | LibreOffice Writer | 17 |
doc | Microsoft 365 | 29 |
docx | LibreOffice Writer | 8 |
docx | Microsoft 365 | 16 |
odt | LibreOffice Writer | 59 |
odt | Microsoft 365 | 8 |
Note — one takeaway from the table is that apps can vary in the size of file, even for the same file type.
File size
Image files from iPhones and Samsung phones routinely are three or more megabytes in size. Moreover, iPhones now default to the HEIC file format for images, which permits higher quality images that traditional jpeg files of comparable size.
However, attaching a 3 or 4 Mb image file into a web page is a no-no. Depending on a number of factors, the time to load and therefore make visible a large image file can run into many seconds to minutes, which is a real problem for Course Management Systems like CANVAS. In fact, CANVAS will time out, fail to load the image, much sooner than a minute. So, even though CANVAS will allow students to upload large files, I cannot allow students to upload large files to my site. File limits will be clearly posted on the submit site for a homework or project file, with sizes ranging from 0.5 to 2 Mb. You need to adhere to the restrictions and solving your file size issues is part of the assignment.
In general, file size problems come from large images. Thus, unless the exercise specifically asks for images, do not take pictures of your handwritten answers and then try and upload the unedited image files. Almost without fail these will be larger than the size restriction for the assignment. Type your responses — text file sizes are small.
If the assignment calls for images, then you must edit your image files to stay within the size limits. Again, this is part of the assignment and you cannot use the excuse that you do not understand how to accomplish the task. Make the file smaller and save to a less demanding format like PNG. macOS comes with an excellent image editor called Preview. Simply open the file in Preview, then navigate to Tools > Adjust size and reduce the image size. If you have Adobe Photoshop (commercial) or GIMP (free, open source) installed on your computer, then you have a great photo-editing application available to you. If you don’t have these I recommend the online editor at www.pixlr.com (Links to an external site.). There are many more options. One local option for Microsoft PCs is to use Microsoft Paint, an app for working with screen shots — you can also use it to do simple edits like resizing image files. I use PicPick a more advanced screen capture app for Windows available at http://www.pickpick.org (Links to an external site.).