Chapter 2 Introduction

When I was a high-school student, biology was my favourite subject. Of course, I need to have the biology textbook to study and my school had assigned me to buy one. My older brother are 7 years older than me and he had the same textbook but an outdated edition. Hence, I didn’t need to buy one, I just used his book. My classmates were using the newer edition, but I enjoyed reading the old edition from my brother. That book had over 1,000 pages and was completely in black and white. Students and teachers alike spotted me reading that book almost all the time at the corner of the spectator stand. At the time of my graduation, that 1,000-page book was broken into multiple ‘volumes’, due to my very frequent reading.

Why do I need to bring this little irrelevant story up? Francis Crick, James Watson and the oft-ignored female post-graduate student Rosalind Franklin discovered the double-helix structure of DNA in 1953. By the time I was reading the biology textbook printed in the early 90s, it was almost 40 years ago. The biology textbook of the period had an introduction to the double helix structure of DNA, as you can imagine, in a boring way. I would say the discovery of the double helix structure is probably one of the most important discovery in science, the textbook of the period did not describe it as a “hype”. Instead, it was described as if it was as boring as counting sheep. When something was written in the textbook, that something is instantly became uncool.

But still, why do I need to bring this little irrelevant story up? Some would dispute this, but the world first automated content analysis system is probably the General Inquirer system. The original paper about the system was published in 1962 by the late Philip J. Stone and his colleagues at the Harvard Laboratory of Social Relations. The General Inquirer (GI) uses a method—that is called “dictionary-based method” now—to quantify the characteristics of a piece of text.

At the time of writing, it is 2020 now. The world first automated content analysis was 58 years old. If there is still an innovation factor of using a 58-year old technology and hype about it, it is actually absurd. Although mew dictionaries are developed almost every month, many communication researchers—myself included—are doing automated content analysis the way very similar to Philip J. Stone and Co. did back in 1962. The discouse about automated content analysis should be very similar to way I read about the double helix structure when I was a high-school student in the mid 90s: it should be boring and uncool.

The motto of this book is simple: Making automated content analysis uncool again. I must admit that I stole this motto from another project: The folks at fast.ai have used the motto “Making neural nets uncool again” since the website’s inception. The founders of the MOOC site feel that the hype around deep learning or artificial intelligence is very unhealthy. They says in their mission statement that being cool is about being exclusive. They want to make deep learning as accessible as possible —including those using uncool operating systems like Windows and with uncool backgrounds (e.g. those did not go to Stanford).

As a person using an uncool text editor —emacs, a now 44-year-old technology— to write this book, I agree with their vision. I want to make automated content analysis as accessible as possible so that there is no more hype around it. Everyone can do it and everyone can then do it correctly. As everyone, cool and uncool people included, can do automated content analysis by reading this open access book, we should ask more important questions such as: Are we doing automated content analysis in a way that has adequate validity and reliability?

I hope you will enjoy reading this book.