Ottawa - December 7, 2014 - I find it astounding that there appears to be a near complete absence of social scientific analysis in contemporary hockey analytics. Social sciences are concerned with both individual human traits and characteristics (psychology) and larger group dynamics, structure, and behavior (sociology). The sport of hockey is organized and played by individuals who have motivations and drives. Those players are grouped into teams, which are complex subcultures that have sets of important informal rules and practices. However, although the social sciences appear to have a clear contribution to make in hockey analytics (e.g. what types of team dynamics work best? How much of a influences do different leadership styles have?), aside from a very small number of pieces such as this one, social scientists have been largely invisible to this point in the development of hockey analytics. The key question I have is: why have the social sciences been absent from the development of hockey analytics to this point in time?
Before diving in to try to answer that question, I want to quickly note that this post builds on two of my previous blog entries. Toward the end of my post Chance and Variance I introduced a distinction between “causal determinism” and “Cartesian dualism” that forms a type of background to some of this discussion. That, in turn, fed into my post about the error term in regression analyses. The long and the short of it is that when models are being built in the context of contemporary hockey analytics, when the error term is mentioned at all there is a very strong tendency to attribute it solely to chance. In essence, when our observed measures are pumped into regression equations, the amount that is not explained becomes a “bullshit dump” of micro level bounces and small scale empirical factors that we cannot hope to measure.
I have not stumbled across anyone else who has taken the time to map out how “intangibles” can be included in the models we use to describe the sport we all love. Maybe there are a few people out there of which I am not currently aware. The feeling of being a voice in the wilderness in terms of applying social science concepts to hockey analytics is what drove me to write this. If you know of anyone who is doing similar work, please send me a message and direct me to that person.
This post will start out with mapping out some important distinctions in the ways in which social scientists plot out the “human element” before turning to why I believe social science has largely been absent from analytics to this point.
The Human Factor
I think the best place to start is to look at how analyses focusing on human beings tend to work. The primary difference between the natural and social sciences is that the latter has to take human interpretation into account. Models of human behavior have to acknowledge that humans, as rational (or irrational, depending on your theoretical orientation)actors continually interpret the world around them. Below is a very simple model that illustrates how, in social research, interpretation fits between stimulus and response.
Simple Model Behavior
Just to avoid taking the easy road here and painting too simplistic a picture, I want to stress that disagreements do exist regarding how humans interpretation fits into actual research and analysis. For example, behaviorism (e.g. B.F. Skinner, John Watson) is marked by the assumption that human interpretation is unknown and unknowable. This is a simple model of human behavior from a behaviorism perspective:
Simple Model Behaviorism modification
The example I like to use for behaviorism is Alfred C. Kinsey’s studies of male (1948) and female (1953). Kinsey, who was a zoologist by training, produced a book about sexuality that were each several hundred pages long, and that featured many tables that organized assorted statistics, which were rates of emission by assorted groups in assorted circumstances, in a systematic way. However, he never looked at what those behaviors meant, and as such the books were really about rates of intercourse and emission. It was profound at the time because it challenged preconceived notions about what is “normal” from a statistical sense, where normal=common. However, if you ask yourself whether counting how often you experiences orgasm in different ways really captures the entirety of your sexuality as a whole you can easily identify the limitations of this type of approach.
Acknowledging the notable exception of behaviorism and a few similar perspectives, I think it is safe to say that most social scientists, particularly those coming from psychology and sociology backgrounds, will try to take human interpretation and perspective into account. Such human factors essentially become mediators between cause and effect. In terms of hockey analytics, the human factor is commonly described (often with no small measure of derision) and “intangibles.” Such “intangibles” are the essence of social science research. As I mentioned in my post on latent variables, there are literally thousands of articles that quantitatively measure things like leadership. In terms of hockey analytics, intangibles, which are presented in a circle because they are always latent variables, fits into a slightly more complex model of human behavior as follows:
Again, just to avoid cutting too many corners, I want to point out that although there is no debate among social scientists regarding whether the human element exists in the ways outlined above, there is a very long standing debate regarding how to research this human element. The key division is between quantitative and qualitative researchers. The former seek to quantify key parts of human psyche, behavior, interaction, etc. To be empirically sound, researcher pay a lot of attention to psychometric validation of the measures being used. It is a very strict process, and result are often validated over hundreds, and sometimes thousands, of iterations.
On the flip side of this coin are qualitative researchers, who think that it is blasphemy to quantify the human experience. From the perspective of qualitative researchers, the individual stories we share should form the basis of systematic analysis. And yes, good qualitative research is painstaking and systematic. The basic gist of the argument is that the individual stories quantitative researcher try to remove from analysis through aggregation of data are the exact dimensions of human behavior that are most important. Furthermore, sometimes the best way to get an answer is to just ask someone who is in position to know. They may give responses that are imperfect, but there will also be some gems in there that would never be found if we look at too big a picture.
Research at the End of Theory (and Beyond)
The above overview of dominant theoretical models of human behavior found in the social sciences is very brief, and there are many more possibilities. I focused on the highlights. I think it is very important to also add one other type of model that is becoming increasingly common in the social sciences. It is not theory-based, so I did not make up a graphic to accompany it. This model focuses on measures without any theoretical backdrop. Fields such as health and a large portion of criminal justice typically examine numbers by using descriptive statistics and population estimates and seeing if patterns emerge. In health, known correlates of health form typical grouping variables, and health outcomes are typically measured along those lines. So we have cancer rates, obesity rates, etc., by gender, income level, region, etc. The known determinates are established, so research focuses on whether rates are going up and down and then goes from there.
Where is the Social Science?
The most obvious answer to the question of why social science has not influence hockey analytics is that the data we have is not conducive to social science research. In other words, we do not have data that fits into most of the models above. The exceptions are behaviorism, which makes no attempt to measure human factors, and atheortical research. People who have read this blog for a while know that the stuff I am doing is largely theoretical. This is mostly by choice. I am not looking to get hired by anyone, so I write about what interests me rather than trying to get attention for purposes auditioning for NHL clubs. However, the data I would need to run the associated analyses for most of what I am discussing in this blog is simply not available to the general hockey community. As a result, I can tell you where leadership fits into regression models, and how determination would fit as a mediator. I cannot provide how big an influence leadership has on team results, and I cannot test whether determination actually moderates a different interaction.
Although I rationally know that this is probably the right answer, something about it just does not sit well with me. If social scientists were really interested in the topic of leadership in hockey, for example, I am sure than someone could convince a junior team to provide access to players. The appropriate data could be collected, and new models could be built using the approaches that I am trying to systematically cover in this blog. Sure this requires effort, but no more so than was required from the key individuals in the analytics movement as they began to systematically code and collect new forms of data.
My best guess (and I reserve the right to change my mind down the road) as to why social science has not been a factor in the development of hockey analytics is that the collection of information about intangibles requires a different type of commit than the collection of data surrounding observed variables. An individual who has access to a television, and the ability to record a game, can code out things like Corsi from the comfort of his or her own home. The process of getting information about variables such as leadership a researcher would have to follow a team, and spend time with players on that team during games, practices, and in the locker room.
There are probably hundreds of social scientists in Canada, and many more in the United States, who have the correct skill set to collect outstanding information about “intangibles,” and who could measure the types of theoretical associations I outlined in the first half of this post. However, individuals in academia who can do this type of work may not be drawn to the topic for fear that it will be viewed as a “fluff” topic that will not be taken seriously as they compete for tenure. Also, academics are often hired with very specific research streams in mind. Building up hockey analytics would be a side topic for scholars who are already pressed for time. Lastly, academics will gravitate to research that draws funding to the university. This is the way academia works, by in large. Unless someone wants to fork over hundreds of thousands of dollars for research into expanding hockey analytics, this topic is a non-starter from an academic perspective.
Some individuals with the correct skill sets work in positions outside of academia. I conduct large scale research projects, and do many types of data analysis, for a living. I am in an even worse position than academics to try to go out and collect much-needed types of data that are currently absent in analytics because I would have to take months of unpaid time off of work to do so. Furthermore, I would be given no credit for this extra work in terms of my own professional advancement. It simply wont happen, unless you have someone with a similar skill set to mine who is looking for a retirement project.
Regardless of why social sciences are absent from current hockey analytics, the result is the same. “Intangibles” are a punch line in some portions of the analytics community. The 40%, give or take, of outcomes that we cannot predict is presented as chance or, even better, an artifact of the inherent complexity of the game due to all that puck-bouncing and line-changing stuff.
As a general rule, looking at data from as many different ways as possible, and from as many different perspectives as possible, provides the best results. This is especially true when it comes to model building. I see huge and obvious holes is current analytics, but I am not sure what to do about it.
I’m sure this is not the most uplifting ending to a post you will ever see, but it is honest. This is where we are, at least in terms of adding a dose of social sciences to the mix.