Mentions of "Cigarette" and "Smoking" in *The College News*

Relative Frequency Visualized in a Bar Graph

Linda Chen '23

I have always been interested in potential trends and patterns within our volumious corpus, so at the initial stage of my individual project, I ran a few keyword searches through the whole corpus using regular expression and python. I chose to look into the frequency of “cigarette” and “smoking” mostly because during the project team’s initial exploration of The College News, we discovered a large number of tobacco advertisements. I then ran a keyword search of “cigarette” and “smoking”, which returned over 2800 results–a shockingly high number to my generation considering tobacco’s well-known negative impact on health nowadays. After an informal discussion with the project team, I realized the frequency of smoking related terms in our corpus might correlate to national policies concerning the tobacco industry during the runtime of The College News (1914-1968). To further analyze the data I gathered through keyword searching, I imported the csv file containing the search results to Pandas, a python library for data analysis, as a data frame, and calculated the relative frequency of “smoking” and “cigarette” by dividing their yearly counts by the total wordcount of all issues published that year. Finally, I graphed my data using Altair, a python library for interactive data visualization. I chose to make a bar graph with year as the x-variable and relative frequency on the y-axis because I hope this graph can reveal patterns and trends in the fluctuation of cigarette-related content over time. As the graph shows, mentions of cigarettes and smoking have been in steep decline throughout the 1960s with the notable nadir in 1964 potentially as a result of “the Surgeon General’s 1964 Report on Smoking and Health”, a research highlighting negative effects of tobacco.