A Position on Peer Reviewing in HCI, part 3

jeffreybardzell January 28, 2012 32

This post is continued from two earlier parts:

In Part 1 I offer a broad rationale for my position
In Part 2 I offer specific recommendations for ACs and reviewers

In this part, I offer my own recommendations for the research community moving forward. I stress from the outset that this is a position. I do not attempt to speak from the Voice of God, but mainly from my own experience. I just want to stimulate a conversation; I have no ambition here of offering the final word.

My recommendations begin after the fold.

Recommendation 1. Treat ACing as explicit, rather than tacit, knowledge that is distinct from peer reviewing.

This recommendation is necessary because the community is growing and the number of submissions keeps increasing. We have to bring in new people, sometimes in large numbers. Rather than leaving it up to them to figure it out, or to rely on SCs who are both overworked and unlikely to spontaneously handle this in a consistent way across the board, let us make an effort as a community to articulate what we want from ACs so junior researchers have at least a model to hold themselves to. We teach our Ph.D. students how to do peer reviews. We need to teach junior scholars how to be ACs. I put my money where my mouth is in the previous part.

Such an approach could also help undermine the “Old Boy Network” that the CHI community (like any other community) sometimes slips into, because it would offer mechanisms to support the infusion of new blood.

Recommendation 2. Each major HCI subcommunity should consider explicitly articulating (and maintaining over time) its primary peer reviewing criteria.

What makes a good contribution to the Design subcommunity is likely to differ from a good contribution in the UIST or CSCW subcommunities. Yet reviewers and ACs (for good and desirable reasons) often cross these lines–but do they bring tacit norms with them?

Such an approach would also be effective for handling emergent scholarly practices. For example, for years a number of us have been vocally complaining about “opinion papers” as a category in HCI, where “essay” is what (presumably) was meant. Good news: the language surrounding the submission process for essays is greatly improved. Bad news: it is less clear how well the HCI community understands the epistemologies and norms of effective essays.

Recommendation 3. Ensure that adequate opportunities for scholarly dialogue exist surrounding our paper decisions.

I feel very strongly about this one.

Beyond reviewer-AC dialogue, this recommendation also advocates for author-AC dialogue and ideally author-reviewer dialogue. Because papers don’t just “speak for themselves,” and given the stakes involved in the decisions we make, if authors feel confused or wronged during the process, there should be some mechanism to express that and work it out in a reasonable way.

This one is extremely difficult to deal with, given all the constraints involved. But I can still make some recommendations here:

Recommendation 3a. Allow authors to respond to as many points (criticisms, suggestions for revision) as they feel they need to during rebuttals.

I know people disagree with me on this, and in some subcommittees it may be appropriate if all that is at stake are questions of fact. However, in my subcommittee, at stake are often judgments and interpretations and the reasoning and assumptions surrounding them. The 5,000 character rebuttal is insufficient to support the intellectual role it was intended to play. It’s also not clear to me that the persent form is better than nothing.

Journals in general and CSCW 12 have a process where authors typically submit a revision with a statement in a two-column format. In the left column is a quote or paraphrase of a particular reviewer concern, criticism, or suggestion. In the right column, the author responds to it and explains what will be/has been done about it. Such documents are arguably easier to read than rebuttals, and they structurally encourage people to stick to the issues, rather than expressing anger or frustration in a general sense. They also support quality revision, which is good for the long-term health of the discipline.

Recommendation 3b. Keep trying and iterating on the CSCW/UIST reviewing process from 2011.

In particular, the Revise and Resubmit process at CSCW was effective at supporting reasonable and, when necessary, lengthy dialogue between ACs, reviewers, and authors. And yet it was less work overall for ACs (at least, in my experience). Also: better decisions were made, and accepted papers had been much more thoughtfully revised than normal (at least in my experience). Now CSCW is having a banner year with tons of great papers that were substantially, not cosmetically, improved.

Recommendation 3c. Allow authors to communicate directly with their ACs.

Allowing this backchannel communication can alleviate many of the pressures on the rebuttal process, support productive scholarly dialogue, and lead to better decisions for now, and better papers down the line.

Recommendation 4. Find or develop mechanisms to enhance both reviewer and AC accountability–and make it part of institutional memory.

We could use both crude quantitative measures and also simple qualitative measures to evaluate reviewers and ACs and encourage high quality reviewing decisions.

Some crude quantitative measures include number of reviews done in the PC system, average word length of reviews, scoring patterns (reviewer average compared to paper average, which would help identify the Negative Nellys and the Pollyannas), number of times accept/reject recommendation agreed with final decision, timeliness of reviews, and I’m sure many others.

Some qualitative measures might include ratings of ACs by authors and SCs and ratings on reviewers by ACs and authors. These might include simple Likert scales that ask whether the reviewer understood the paper, whether actionable revision suggestions were offered, whether the rationale for the decision was explained, and so on.

Another idea is some sort of post-mortem, where SCs and ACs talk about their experiences–not just in a general way, but in an individual way. If an AC is a Negative Nelly or a Pollyanna, he or she might not even realize it, and yet I bet the SC generally knows.

—

As I get feedback and pushback, I may revise and/or expand this. Regardless, it is meant to be constructive and to start, not to finish, a conversation.

32 Comments

Jofish
14 years ago Permalink

One small comment:

Such an approach could also help undermine the “Old Boy Network” that the CHI community (like any other community) sometimes slips into, because it would offer mechanisms to support the infusion of new blood.

I would go a bit further:I think it should be policy that nobody serve more than two years out of three on any PC. People should be encouraged to come back after their year off, or to chair or PC other portions of the same conference, or PC for different conferences, but I think this explicit rule would be a minimal loss for anyone but a significant net gain for the community.

Reply
Dianne Murray
14 years ago Permalink

Excellent post, well-stated and I fully support your (and Jofish’s) recommendations. I would also call for specific workshop training in reviewing (as you say happens for your Ph.D. students). This may be hard to implement but I think the time has come to be proactive on this issue.

On another point, the other ACM SIGCHI conference of note is IUI (Intelligent User Interfaces) which runs on the CHI model too.

Reply
- AaronGenest (@AaronGenest)
  14 years ago Permalink
  
  As a student, I would go even further: we should be explicitly supporting a mentored approach to learning how to review. Inexperienced reviewers should be chaperoned and given less weight in a final tally of the numbers, but we should be asking for many, many more students to review. Given the number of students available, it would not be impossible to expect two student reviews per paper (in addition to faculty and professionals).
  
  I find it very difficult to get assigned reviews as a student. My reviews come from only people I already know (who are, of course, only too happy to unload a review on me); it’s very rare that I get a review assigned by someone with whom I am unfamiliar. This despite that I am listed as available with a fair level of expertise in a variety of fields. As a student, I should be swamped with reviews at every major conference, allowing me to improve my critiquing skills and increasing my exposure to the field.
  
  If we want to improve the quality of reviewing (above and beyond the discussion of the AC in this blog — a topic above my pay grade), we should be doing something constructive to grow the reviewing skills of our students in a systematic and aggressive manner.
  
  Reply
  - Dianne Murray
    14 years ago Permalink
    
    See my post here, https://www.facebook.com/groups/iwcFB/ on 31 Aug 2011
    Extract….At the IwC Editorial Board meeting at CHI 2011 we agreed to initiate a scheme aimed to encouraging ‘new blood’ and at growing our reviewer pool for the future. We have created a category of Junior Reviewer so that post-graduate students, junior members of staff, R.A.s and the like can learn reviewing skills. We will provide Mentors; a Board Member will, where appropriate, choose a Junior Reviewer to serve as an additional fourth reviewer and give guidance and support. Junior Reviewers will gain invaluable experience in ‘learning by doing’ and have the added advantage of being able to keep up-to-date with the latest work and then moving on to being a reviewer in their own right after a time. A new classification is now in place and so nominations and recommendations are welcome….
  - Phoebeengers
    14 years ago Permalink
    
    One issue with student reviews is that they tend to be much more negative than other reviews. The issue here is not so much lack of background knowledge (in my experience, student reviews are very good in terms of really dissecting a paper and relating it to the latest literature) but that students often don’t have enough experience to be able to understand how flawed research can also be good research. This is why I always limit to 1 student per paper when I am AC.
Helena
14 years ago Permalink

I fully agree with Recommendation #4 (I agree with all of them, but #4 is the one I have been crying out for most). The only way ACs and reviewers will do a better job as a whole is to have accountability. If we are not willing to let go of the anonymous reviewer model then there needs to be some other method for accountability. I see no reason why we can’t instill something like that since all of the conferences you mentioned use PCS.

Reply
Ed H. Chi
14 years ago Permalink

Me dMost if not all of these recommendations have been talked about in the community, and probably several times. The general consensus is that change needs to be gradual and a single change needs several years to know the effect.

Having reviewed for many other non hci confs, hci actually does very well in having good process and standard already in comparison.

Worth also to point out that literary criticism is structured around domains that moves slowly, and dialog techniques there might be problematic for hci. (My father is a prof in that field ).

Reply
- jeffreybardzell
  14 years ago Permalink
  
  Having reviewed for many other non hci confs, hci actually does very well in having good process and standard already in comparison.
  
  I don’t doubt you, but it is hardly a reason not to strive to be better. Also, more importantly, I think this varies by subcommunity. Subcommunities with relatively mature and stable norms are likely to enjoy better processes than subcommunities that are more fluid (like Design).
  
  Worth also to point out that literary criticism is structured around domains that moves slowly, and dialog techniques there might be problematic for hci. (My father is a prof in that field ).
  
  Don’t agree with you about literary criticism, but that’s beside the point. I never said that we should emulate literary criticism. I said that peer reviewing is a professional critical practice. There are many critical practices, such as design criticism, architectural criticism, film criticism, museum curation, game criticism, and interaction criticism that are better exemplars and are, in fact, what I had in mind. Thanks for giving me a chance to clarify that. BTW I have a post here relevant to that: https://interactionculture.wordpress.com/2008/10/19/species-of-interaction-criticism/
  
  God forbid we start having Lacanian reviews of CHI submissions!!
  
  Reply
alandix
14 years ago Permalink

Thanks Jeffrey for detailed and well considered thoughts on this.

However, while improving conference reviewing is really important, I do wonder if the real issue is the nature of HCI conferences. To some extent the way conferences in HCI and other parts of CS have become important venues for final publication seems to me deeply problematic. The fixed timescales inevitably lead to compromises between rigour, clarity, interest and novelty. I worry about the examples that new researchers have to follow if they assume that the outcomes of this process, however low the acceptance rate, are paradigmatic.

In other disciplines the conference is where you find out what is going on in the discipline, meet people, etc., whereas the journals are where you look for the final rigorous versions.

Reviewing can be problematic in any venue and journal reviewers may well need the same kind of mentoring and feedback as you have described for conferences (Dianne I’m sure you’d second this). Certainly, peer review can never be an absolute guarantee of quality, but easier to do properly in the less constrained timescale of journal review.

Furthermore, conferences are expensive and primarily in the western hemisphere. Various conferences, not least CHI, have made efforts to be open to those in countries normally less well represented. However, inevitably, if the key research venues are conferences, this leads to an element of pay-to-publish and shuts out research from the developing world and in general those with less access to funding.

Reply
John Paolillo
14 years ago Permalink

The conference scene is a bit different from journals, but I’d like to comment from the journal perspective a bit. Conference proceedings are archival, so it is relevant in some respects.

I do about 50% of my reviewing for a journal for which I am associate editor. Almost all of the reviewing I do concerns methodological and/or statistical issues. I strive to be thoughtful and to argue points carefully, and to consider the status of the submitters (most often junior faculty or students). Unfortunately, a lot of the work I see, even from well-established people, is rather shabby in these respects. Methodologies are poorly presented or justified. Stats are garbled, or just plain BS that the author doesn’t understand. I can’t willingly accept these papers. Sure a lot of work went into it, but a lot of what I see look like salvage cases.

Over time, my recommendations have gotten harsher, as experience has taught me that the shabby stuff gets through with minimal modification regardless, especially when there are senior people behind it. Maybe no one cares, but I think that science, if it is to count as such, and if it is to be something other than a nerdy fan club, needs people to seriously examine content and make sure that something that gets published really makes sense.

I think people used to do this (e.g. Lorenz 1963, Deterministic Nonperiodic Flow is a masterpiece both scientifically and rhetorically), but now it seems to be nearly all that one can do to try to warn people not to make fools of themselves, even if it would probably take someone else 20 years to discover the problem (if ever at all), by which time people are busy smelling the primroses down some other path.

I spend more time on this than on my own research, get next to no credit, have little effect on outcomes that I can see, and don’t really enjoy it either.

Reply
Julie Kientz
14 years ago Permalink

This is a great series of posts to open up a dialogue about how to improve the process. I agree with you on many points.

One thing I feel that is missing is the fact that there is still severe reviewer and AC overload and fatigue. Many of the problems you’ve identified are likely a result of this. My guess is that many people don’t actually WANT to write bad reviews and are not lazy (most lazy people don’t end up in research), but they are actually overworked and do not have as much time as they would like to spend on the process. Many ACs are junior faculty also striving to get tenure, secure grants, learn how to teach, etc. In many universities, “Service” is usually supposed to be just 10% of what we do, which includes ACing and reviewing, but also includes the service we provide to our universities such as PhD admissions, curriculum committees, etc.

Many of the suggestions you make require even more work on the part of the reviewers and the ACs (reading through lengthier rebuttals, more conversations with authors, etc.). I don’t disagree that these would help improve the process, but I wonder if they would work in practice. My guess is that the reason the rebuttal process is ineffective is that by the time the rebuttals roll in, many reviewers have already “checked out” of the process and have moved on to the other million things they had on their to do list. As an AC, it’s often hard to get reviewers to engage in a discussion, let alone change their scores. Perhaps requiring reviewers to check a box saying “I have read the rebuttal” with a separate box for their response could be a simple thing to help with this to force them to at least re-engage.

One solution would be for people to turn down AC and review requests if they don’t have the time to do them properly, as you suggested. This is always a good idea, but from my understanding, it was difficult to find people to agree to do either last year, and thus we end up in a situation where reviewers are agreeing to do 8-10 reviews, and we have to start asking PhD students to fulfill AC responsibilities (while many are qualified, it still takes a lot of time to establish a network of people who will agree to do good reviews).

I think your suggestions about accountability, along with Jofish’s mandatory “sabbatical” might be practical steps to help encourage people not to say yes to service requests they don’t have the time to do properly. But if we all start saying no to everything because we’re too busy, where does that leave the community then?

Reply
Gilbert Cockton
14 years ago Permalink

Ed’s right Jeff, much of this has been discussed, but not all. You do have some good new points, and even those that look like the existing ones are nuanced in different ways.

ACM barf on #4, concerns about libel, slander, personal data etc. Black lists are out, white lists are out, ….

Reply
- Paul Resnick
  14 years ago Permalink
  
  I hope we won’t let ACM barf on the idea of accountability! Many of Jeff’s suggestions in #4 are quite good starting points on ways to let people get credit or blame for being good or bad reviewers/ACs, and they would surely hold up in court against any libel or slander accusation (truth is a defense against libel accusations).
  
  Reply
Ben Shneiderman
14 years ago Permalink

Antti Oulasvirta blog covering 8 reasons your paper gets rejected by CHI reviewers and ACs is a helpful reminder that CHI is a scientific discipline. Jeff Bardzell correctly reminds us of the social and flawed nature of all review processes. Then, Ed Chi wisely suggests gradual change to a pretty well-working system.

I like submitting to CHI because of the typically thoughtful and multiple reviews. However, some fraction of the papers have always produced contentious discussions, so the 5000 character rebuttal is a reasonable approach to give authors a voice, while not getting enmeshed in a lengthy time-consuming debate.

I’ve supported proposals such as Eric Baumer’s to have the content of reviews for accepted papers available online, thereby raising the profile of reviews, and giving reviewers visibility for their reviews as intellectual contributions (with names visible). This doesn’t address the issue of negative reviews that deny publication, but it could have a salutary overall effect of encouraging thoughtful reviews.

Reply
- Barry Brown
  14 years ago Permalink
  
  I don’t think Jeff is saying peer review is “social and flawed” versus Antti’s “scientific” account (I assume Ben is being mischievous)
  
  But that, perhaps accidently, does nicely characterise what makes CHI so interesting. Most of the time we wouldn’t meet, argue – and misunderstand – each other. CHI forces us to do so. That’s real science – questioning your assumptions.
  
  Reply
Tarun Gangwani
14 years ago Permalink

1) If there were a rebuttal process from submitters, it ought to be anonymously made, just to eliminate any bias. It would be up to the AC/reviewer to divulge their identity.

2) Julie Kientz makes a good point — there is likely some fatigue occurring if people are only providing “3 sentence” rebuttals (which is impressively awful). Subcommittees ought to scale as large as the constituents they represent… does this not occur in the CHI world?

How many rounds of reviewing does a CHI paper typically go through? In Psychological Science & Review, I’ve seen a paper float by 4 people, who provide substantial feedback, before the paper is returned with either a rejection/acceptance. Most of the feedback is constructive, though. You may have one oddball here or there who is a part of the “old boy” community, but maybe the community is different because it is more “established”?

From my limited perspective, HCI suffers a growth “problem” — the ideas are scaling so fast that it is difficult to pinpoint which ones are viable. Taking the “let’s let everyone join the party” approach might lead to some folly that shouldn’t even be reviewed.

A rejected paper, to me, represents a paper that had significant work behind it but missed the motivation to be worthy of publication. If more rejected papers were accepted (even with critical review and feedback), then HCI would perhaps be too wide in scope when it comes to the ideas?

I’m merely playing devil’s advocate here, though. I will say that nearly ALL conferences/journals I’ve seen tend to be remixes/variations on some theme that is unspoken but understood. A strong group tends to cling to the ideals it can showcase as viable and doesn’t want to be tainted by radical change, else they risk being “unseated”. Sounds like an ego thing. Don’t know if I’m on the right track.

Reply
David McGookin
14 years ago Permalink

I was also reading Antti’s rebuttal on twitter for having words put in his mouth. The problem is there are two points. Antti’s initial observation was about papers with scores of 1-2.5’s; the auto-rejects from CHI (he also wrote from the perspective of a particular sub-comitie). My experience of those papers is that they have a number of the key problems, shouldn’t be published and add to the “noise” when trying to research useful things. With CHI, because there are so many submissions and only so many slots for presentation (76% of submission are rejected), a large bulk of papers fall in the dead zone, between say 3.5 and 4.5. Not bad enough to get rejected, not good enough to be auto accepted. Only a small subset of these will be accepted. The quality of reviews really count there, and one bad reviewer or ac can kill the paper. I agree with Jeff that the effort required to submit a paper that would fall in this range is disproportionate to the ease at which it can be rejected. I said on twitter at the time of Antti’s original post, that we need to root out bad reviewers in the same way as bad papers. Bad reviewers not only keep out good science and results, they also let through bad results and flawed work. That has an impact on everyone else. Especially if a reviewer of your paper considers you should be using these bad results, and rejects because you don’t. All submissions deserve fair, factual, unbiased and considered reviews. No ifs, no buts, no maybes, no I didn’t have time to do it properly. If there are reviewers that continue to fail to provide those, they should be banned or at least more closely monitored. For example, weight review scores by the previous “quality” of those reviewer’s reviews. How you do that, of course, is the rub. In both reviewing and reviews I have gotten, i’ve seen cases where the reviewers just haven’t cared to read the paper or think about it. I’ve read papers this year, from fairly well respected conferences, that clearly misinterpret other’s work. Good reviewing isn’t just about making rejected authors feel better. However, there isn’t a lot of motivation on conference organisers to up the quality. They just need to fill the slots available and, as long as there are a fair number of papers to reject, can claim high quality. I’m sure the last statement is a little cynical, but the standard review model assumes reviewers are doing it right, without any checks and balances.

Finally, if we push conferences as premier venues, where does the interesting but not quite done and we don’t know what it means and could really do with some good discussion work go? Are we pushing academia towards only showing finished work that is so polished it cannot possibly be questioned? Are we in danger of loosing the discussion and critical argument factor in research?

Reply
annlight
14 years ago Permalink

One of the things that distresses me about conference reviewing processes is the lurking acceptance rate issue and the way that consequently some reviewers and ACs are looking for reasons to reject. Journals often work with a reason to accept mentality and it is far healthier for the community. Whereas a good idea that has not quite flourished yet can be nurtured by reviewers for a journal, this is not the culture of the big conferences. Although it doesn’t motivate everyone to the same degree (and some people don’t even know), the idea that about 75% will/should be excluded will affect the way that reviews and meta-reviews are written.

I found it exciting, though daunting in terms of workload because of the conference deadlines, to hear of the mods to CSCW this year. But I also heard people muttering about unacceptably high acceptance rates resulting from the opportunity to rework papers that had not quite achieved publishable status first time. And conferences that deliberately work to be inclusive, to support bringing the community together, are not considered first rate because their acceptance rate is higher than the ‘best’ conferences. Though, in fact, their papers may be as good – rejection rates reflect number as well as quality of submissions.

CSCW has proved that it is not wholly a factor of timing that papers for conferences are one-hit whereas journal papers can go round until they are publishable. I usually see a journal paper 3 times if I have made serious comment on structure or content, but by the third time I am much happier to see it printed. However, conferences continue to be competitive spaces in a way that journals are not (except dedicated special issues). You may not get published fast, but short of writing total rubbish, you have a chance to benefit from being seen repeatedly by the same reviewers and editors.

Knowing that last time DIS had an acceptance rate of about 18% and that submissions are up 50% – what’s that going to do to the mindset of the ACs and reviewers who are applying their critical gaze?

Reply
Loren Terveen
14 years ago Permalink

Great discussion. I’d like to reply more substantively, but have time only for a few brief thoughts.

1. Jeff, I really appreciate your positive words about the new CSCW reviewing process. I think this is a qualitative step in the right direction, I think we will continue to improve it, and I know that the rest of the CHI community is watching carefully, so there is reason to be optimistic that this significantly improved method will spread.

2. In addition to all your other comments about ACs and reviewers, I (and a number of other people) have analyzed review data for a number of conferences. One striking finding is the amount of disagreement between reviewers: for all but (say) the top 10% and the bottom 30-35%, typically at least one reviewer (and/or AC) is pretty much in favor of accepting the paper, and at least one is pretty much in favor of rejecting it. One position to take is to say that one of them is wrong. I don’t think this is the correct interpretation. Rather, they represent different values (and, one presumes, different segments of the broader community): there is no one right outcome. Several good reviews, even accompanied by one or more negative reviews, is evidence that segment (perhaps a large segment) of the community will value the paper.

3. In another analysis, we’ve found no correlation between review scores papers received when they were submitted and the number of citations they subsequently received. More evidence for lack of certainty in the review process. Also, as I understand it, analysis shows that winning or getting nominated for a Best Paper doesn’t lead to more citations, either, which again is fascinating (or sobering, perhaps).

4. Finally, like Gilbert, I’ve seen incredible levels of paranoia about your proposal #4. Like Paul, I think this should and can be done right, but there is an extreme worry about any approach that involves recording (and using) data about the performance of reviewers (including ACs).

Reply
John C. Thomas
14 years ago Permalink

Interesting discussion. I’d like to add just a couple more comments. First, I believe (and have no evidence except the general findings of social psychology) that both the reviewers and the AC’s may be applying an “unspoken” and mostly unconscious filter to papers which might be described as “Is the person who wrote this someone we want in the community? Or, are they really one of *us*?” Imagine two papers describing the same new system and the same empirical results about that system. One of these papers cites a lot of SIGCHI authors and uses SIGCHI terminology. Another of these papers cites a lot of IEEE authors or HFES authors and uses different terms. I strongly suspect the first has a much higher chance of getting in. What gets very tricky, as hinted at by one of comments above is that different AC’s or reviewers sometimes feel they are part of different subcommunities and it is really the allegieance to these subcommunities that comes to play in discussions about the “quality” of a paper.

Second, underlyingly, we are all often too much in a “rush” to do really good science, really good design, really good analysis, really good writing, really good reviewing or really good ACing either. !! This is not unique to SIGCHI conferences, but seems endemic to our society. Software releases come every couple months and they are always buggy. Papers get sent in and it is clear that despite MONTHS of effort behind the papers, the authors were too rushed to actually PROOFREAD their paper (as opposed to running a spell-check program). I don’t have a “solution” for either of these issues.

Reply
Joshua Tanenbaum
14 years ago Permalink

While I found this post extremely interesting, and have been slowly digging through the subsequent discussion, one thing that I think is missing from both is a more considered approach to the notion of “critique”. I come to HCI from a Humanities and media studies background, so I have a vested interest in the nature of critique at all levels within this community. I was excited to see Jeff open this series of posts up with some strong claims about reviewing as a form of critique, but it seems that the discussion thus far has veered into a broader conversation about the structural and organizational issues around maintaining quality of reviewing. While I agree that these are crucial issues that need to be considered, I wonder if we have really explored the implications of reviewing as a critical practice.

Within HCI I think critique and critical approaches to technology are treated with skepticism because of their “non-scientific” roots.
I’ve heard it said that it’s easy to criticize but very difficult to critique: the preponderance of criticism over critique in HCI is something that has been remarked upon already by Bertand Meyer (See his article here: http://cacm.acm.org/blogs/blog-cacm/123611-the-nastiness-problem-in-computer-science/fulltext). As a field, I think there is a discomfort with critique as an epistemology for either papers or reviews because it lives outside of the comfortable box of “verifiable Truth”.

As Jeff has rightly pointed out, the role of the reviewer is not to mechanically identify errors in method, but to render judgement on the paper along a number of vectors. Rendering Judgement requires some form of rhetorical justification: in Law this would be the “opinion” offered by the judge on a subject. This means that reviewing a paper cannot exist within some sort of value-free zone: instead the reviewers must take a position in relationship to the paper that reflects their own values and their understanding of the values of the field/subcommittee/conference/ etc.

The crucial thing to note here is that this isn’t a practice that is solely amenable to structural changes in the reviewing process. While it is possible to attempt to reform our current systems to create more room or incentive for thoughtful critique, this on its own is insufficient for the type of widespread cultural change that a call for critique entails. And ultimately, what I think Jeff is talking about here is a shift in cultural norms, rather than a change in institutional structures. I think it is naive to think that simply changing our reviewing process will solve this problem. Instead, I would argue that we need to actively promote a more nuanced process of critique throughout our professional lives: do we train graduate students to simply identify type 2 errors, or do we instill in them values and skills that allow for more skillful rhetorical investigation? Do we stop reading a paper when we detect a problem with it’s experimental rigor, or do we seek to position it within a larger discourse? Do we write a paper that relies solely on statistical validity to demonstrate it’s relevance, or do we actively seek to tie our writing into a broader rhetorical discourse around HCI? This means that like it or not, we have to embrace traditions of critique, even if that doesn’t sit right with our more scientistic inclinations.

My two cents.

Reply
- jeffreybardzell
  14 years ago Permalink
  
  Beautifully stated. I wish I had said this so well myself.
  
  Reply
  - Phoebe Sengers
    14 years ago Permalink
    
    Hear hear!!
alandix
14 years ago Permalink

Fascinating stuff Loren, especially point 3! Wow!! Although I don’t really buy the ‘different values’ excuse (I know not meant to be, my words) in point 2. A good reviewer will realise there are different viewpoints and attempt to review given this. If we have high variance we either have (a) bad reviewers or (b) no coherent academic discipline. Take your pick.

John, on CHI vs IEEE/HFES, do remember that a few years ago the CHI reviewing guidelines explicitly asked for references to recent work *especially CHI*. As in any user interface, users (reviewers) do what you ask them to do, however stupid

Reply
Eric P. S. Baumer
14 years ago Permalink

Great series of posts, and great discussion with tons of insightful points.

Joshua’s comment captures something that had been nagging me, as well, while reading the discussion. Jeff’s original point, I think, is not about the structure and mechanics of the review process, it’s about grappling with what the culture of reviewing means for our field. That said, cultural norms do have implications for institutional structures, and considering the relationships between the two may be one valuable way forward.

For example, as Ben mentioned, I’d suggested a year or two ago the idea of making reviews for accepted papers publicly available. Not only might such a step help with Jeff’s point #4 about accountability, but it might also help in terms of training junior members of our field about norms and standards of reviewing. I think the upshot of Jeff’s discussion here is not only acknowledging the role of the AC as critic, but examining how our institutional structures can be (re)shaped to (better) reflect the cultural norms we want them to embody. That said, I’m also mindful of, and in agreement with, Ed’s advice that slow, deliberate movement is likely more desirable and more healthy for our field.

As a last point, it is perhaps important to recall that SIGCHI and ACM are not the only places where conversations about (the structure of) peer review are happening. For example, David Horrobin’s Something Rotten at the Core of Science questions whether or not peer review is even a valuable enterprise, suggesting that from a scientific perspective there is little evidence that peer review is actually beneficial. Perhaps, then, another way of looking at Jeff’s piece is not only as a critique of peer review, but also as a defense of peer review, articulating why it’s valuable, and suggesting that acknowledging its nature as a critical endeavor may help us better understand and use its contribution.

Reply
- jeffreybardzell
  14 years ago Permalink
  
  Perhaps, then, another way of looking at Jeff’s piece is not only as a critique of peer review, but also as a defense of peer review, articulating why it’s valuable, and suggesting that acknowledging its nature as a critical endeavor may help us better understand and use its contribution.
  
  What a provocative idea. I had not considered it that way, but now that you say it, I think I might believe that. Or maybe I believe that that is the ideal to which we aspire.
  
  I have heard people on AC committees say things like “we don’t have much of this kind of paper in this community” or “it would be really good for the community to see more of this methodology even though the findings are banal” or “if this is going to be the community’s first impression of X, then we need to be careful, because it may not cast X in a favorable light.” All of these (clearly related) arguments are curatorial: they argue for/against the paper on the basis of largely external (to the paper) criteria. (Obviously, internal criteria will matter, too, but this curatorial dimension can be a huge X factor in papers that are not clearly in or out.) And this curatorial dimension could also help explain Loren’s finding of little to no correlation between review scores and citation counts. Putting something out there in part because as a type of paper it is not represented well is a high risk/high reward strategy from the point of view of future citations.
  
  Reply
Mark Rouncefield
14 years ago Permalink

Some observations:

1. Life is unfair – it just is – get used to it. Most attempts to make things ‘fairer’ usually simply result in different forms of unfairness. If you adopt this frame of mind you can probably overcome your concerns about CHI reviewing and the rejections your papers get.
2. Most CHI submissions are not very good – I was going to say ‘shite’ but I’m making an effort not to be offensive. Some, many, are ‘completely not very good’, and, for a variety of reasons, things seem to be getting worse. Out of 1200 submissions around 800 or so can probably be quickly discounted as not very good. Of the rest around 70-100 will be excellent and around the same number will be very good. That leaves around 100 or so that cause all the trouble – because the 50-70 or so that get accepted are in reality no better than the 50-70 or so immediately below them in the ranking that don’t get accepted. We spend a disproportionate amount of our time on these. Personally I wish we would just accept this problem and stop pretending that we can work our way to some reasoned decision – because the reasons are almost always bogus and have a strong tendency to piss people off in a major way. So lets just hold a lottery instead – that’s right – a lottery. Write to the authors and tell them that their paper is good but that there are too many good papers for the available slots and there is no other fair way of differentiating between them.
3. There’s nothing that much wrong with CHI reviewing – two sentence reviews or extended reviewer rants are the exception not the rule. I think providing every paper with 4 reasonable reviews, to have those reviews discussed and to provide an opportunity to respond to the reviews, is actually quite impressive. In general I think the rebuttal process works OK – allowing authors to respond to reviews, either by showing how their paper can accommodate reviewers’ criticisms or by providing an opportunity to vent their spleen at the reviewers’ stupidity – either is a reasonable response to me. I do like the idea of providing examples of reviews (and rebuttals) of successful papers: put them in the Proceedings (with appropriate caveats since the reviews apply to the original submitted paper and not the final camera-ready that presumably will have been improved by the reviews). Personally, I don’t think PhD students should be doing reviews – but only because, as a product of their recent training, they often have a strong tendency to be mean reviewers (and scorers). And the alt.chi approach is simply corrupt, or at least I think that’s the word for a system where you put your paper online and invite your friends to review it. Despite all the debate I’ve seen, any problem with CHI reviewing is not really connected to issues of ‘science’ or ‘critique’ – its really a practical problem, its about the scoring. It’s the scoring that causes the problems. Firstly, because we clearly don’t trust and so don’t take enough notice of the scoring and end up discussing far too many papers. Secondly, because reviewers appear unable to make up their minds and consequently we get the phenomenal clustering of scores around 2.5-3.5. And the answer is simple – don’t let people give those scores, make them decide between a 2 and a 4 – ultimately reviewers should be made to decide do they think this paper should be in the conference or not? But, if you still think the standard of reviewing is a problem, the answer is simple enough – pay the reviewers.
4. Before you solve a problem, you need to correctly identify it – despite the debate here and the fact that we’ve all had what seemed like unfair or mean reviews – the problem is not mean or lazy or generally despicable and unaccountable reviewers and ACs. It really isn’t. Of course such people exist (me for example) but such an analysis, whilst understandable, is just silly – you’re completely missing the point and addressing the wrong problem. The problem is structural and organizational to do with how the papers sub-committees are organized and run and decisions made. There’s something almost Monty Pythonesque about decision-making in the sub-committees – I have in mind the Ministry for Silly Walks sketch – all that’s missing is a dead parrot. What happens at present is that papers that get to the committee have already, in a process lasting several weeks, had four, and usually five, reviews; those reviews and scores have been discussed and been rebutted and that rebuttal then discussed. But if they can’t agree in the committee the paper is given to some poor knackered, jet-lagged and possibly pissed AC to review overnight and effectively make the decision for us. If that AC can’t do it the paper is given to another (knackered, jet-lagged, hungover etc) AC (sometimes on a different subcommittee so they haven’t heard any of the debate about the paper) to read in 30 minutes or so and make the decision for us. If that AC can’t decide the decision goes to a vote, where most of the people voting haven’t read either the paper or any of the reviews. Finally if the vote is too close to call we put a dead parrot in the middle of the table and spin it… OK, I made that last bit up, but it’s in line with the general silliness of the whole process.
5. Talk about ‘accountability’, of ACs and reviewers, is mostly hot air. Personally I think ACs should never score beyond the range of their own reviewers – after all, they chose the reviewers based on their knowledge of reviewers’ expertise – they should stand by that initial decision and if that don’t they should be given a hard time in the sub-committee – though they rarely are. Anything beyond this kind of ‘accountability’ just isn’t going to happen. Firstly, because ACs and reviewers, quite rightly, are not going tolerate it. ACs already donate considerable amounts of their time, thought (and money) to the process, so I think they will probably tell you where you can stick your suggestion that they should be made accountable and subject to formal scrutiny (and I think it will be somewhere that the sun don’t shine). Secondly, because the mechanics and practicalities, the criteria and the process (as well as the ethics) are beyond any easy resolution – you’ll doubtless still be talking about this in 5, 10 years time..

Reply
Joseph A. Konstan
14 years ago Permalink

I love this thoughtful discussion.

I want to raise another problem/suggestion. I think we frequently have trouble not just in CHI, but in much of the computing field, with distinguishing between correctness and importance. Even though our review forms are very clear about the separation, too often I see cases of unimportant work that “can’t be rejected” because it has no flaws, or of important work that is rejected even though the flaws don’t compromise the key contribution. (To be fair, I’ve also seen exceptional reviewers and program committees who fight against these problems.)

I’ve wondered for a while how things would be different if we were willing to do reviews earlier in the process. WARNING: This isn’t even feasible for all kinds of research — the model I’m using is one I’ve found in parts of the health sciences.

Imagine a situation where you write all of a paper except for the results and discussion. In other words, you submit the background, the hypotheses or research questions, the methods, the metrics. And at that time, you could receive one of many review results:

* Exceptional: If you carry this work out as planned, it should be published, regardless of the results. It is a well-designed research study with well-justified methods, and the field would find it important and interesting to know how the study comes out regardless of the results. (One might imagine a study here on the effects of texting-by-talk while driving; it would be interesting if we found that driving was impaired; it would be interesting if we found that the texting was impaired by driving wasn’t; it would be interesting if nothing were impaired; etc.)

* Conditional: If you carry out the work as planned, it would be publishable if you get a particular result. But if you don’t get that result, it isn’t interesting (or at least not at the level of this conference/journal). (One might imagine someone who proposes that mousing may be more effective if the clicking is done using keyboard keys with the other hand, while the mousing hand is only used to point. Assuming good methods, a good set of ways to measure effectiveness, etc., this study would be very interesting if using the second hand really is better along some of the dimensions. However, if it is uniformly worse, nobody’s that interested in publishing “yet another wacky idea that doesn’t work.”

* Don’t Bother: Even if you carry out this study perfectly, we don’t care. The potential results aren’t interesting or important enough to publish. Save yourself the effort and work on something else. (Many of these studies could be small variations on existing ones. For example, a study looking at whether changing the link color in social network invitation e-mail from blue to blue-green.)

Of course, there is much more that reviewers could do here. They could offer critiques on the methods that _could actually affect the work about to be performed_, not just complain about what wasn’t done right.

I mentioned the health science element here. I work with epidemiologists, some of whom take an approach not very different from this. Their grant submissions are detailed enough to serve as a review of the proposed research (with all methods spelled out). They justify sample sizes, metrics, and the rest. Another similar pattern is seen in “mega-journals” like Science and Nature, where an abstract is submitted and articles may be rejected immediately based on lack of interest (before any scientific review); being reviewed is then a check on the correctness, once the importance filter has been passed).

Reply
Gilbert Cockton
14 years ago Permalink

Joe’s suggestion is interesting. Most rejected CHI papers are methodologically weak, and if the authors had been able to write a paper with everything except results and discussion in advance, it would have been rejected. Such a system runs the risk of greatly increasing poor submissions as authors make use of a free research design review service, and they could submit more as they would save the time from not carrying out the experiment.

The difficult papers are not in this category however. They are ones that methodologically challenge the review process, either with appproaches for which CHI has few competent ACs (plus SCs may not know how they are), or ones that competently breach existing rules of methodological hygiene. The current short rebuttals make it difficult to reply to incompetent reviewing here. The simple answer may be to develop such papers outside of CHI and develop a corpus that can be referenced in later CHI submissions.

I was very lucky with my CHI 2009 paper. It was methodologically unusual for CHI, but I had two reviewers who knew what to do with it, and also what to do with the gormless third reviewer who was clueless. I also had a sympathetic AC who managed the reviewers well. The chances of repeating this with previous or subsequent challenges to CHI orthodoxies are very low. We may have to accept that a proper alt.chi is the best way to deal with such work and to establish examples of how to do really new stuff well. Reviewers who clearly don’t understand a paper, its position or its approach, and their legitimacy, can be challenged publicly in alt.chi reviewing. I can’t see how the CHI papers process can be revised to make these sort of fair exchanges possible.

The bottom line is that it has always been hard to get some types of new work into CHI. Some authors learn to tone down challenging positions to avoid causing offense or reviewer fits, others are skilled in a bait and switch where the awkward material is slipped in for the final version of an accepted meek submission.

Some, including me, simply don’t care. I write as I believe is necessary and take the consequences. I learn how challenging positions are received by random reviewers and ACs in the process, and over the years this helps me to steer papers into the right forum in the right format.

Reply
stuaart
14 years ago Permalink

I wrote a metablog about all this stuff. Might be of interest:

http://notesonresearch.tumblr.com/post/22650772011/hci-in-crisis-two-sciences

Reply
Chung-Ching Huang
11 years ago Permalink

I wonder if there is any discussion toward the 2016 new CHI review process.

http://sigchi.tumblr.com/post/108282241520/changes-to-the-submission-and-review-process-for

Reply
Jofish
11 years ago Permalink

Chung-Ching, I’m not sure quite what your question is. We definitely took these sorts of points into account when redesigning the current process. Maybe not quite line-by-line, but strongly informed by this and related discussions.

Reply