Author Archive

SWIRL ’12 looks at The Future of IR

January 10, 2012 Leave a comment

SWIRL ( is a workshop series that aims to explore the long-range issues in information retrieval.  The 2012 meeting will take place next month in Lorne, Victoria, Australia.

Participants were asked to nominate three papers that “represent important new directions, research areas, or results in the IR field.”  Since highlighting the bleeding (and soon-to-be-bleeding) edge is a goal of NR, I decided to mirror the list of nominated papers below.

(Original list is at

What do you think of this list?  Are there other papers you feel should be on this list? (Undoubtedly there are, as this is the expressly limited input of a few participants!)  What do you think is the research that has the strongest implications for IR research for the next few years?

Categories: Conference, Survey Tags: , ,

SIGIR 2011 Highlight: “Out of Sight, Not Out of Mind”

August 1, 2011 Leave a comment

A “highlights” post is meant to call attention to a paper which grabbed our attention and we think is worth your time.  This paper, “Out of Sight, Not Out of Mind: On the Effect of Social and Physical Detachment on Information Need” by Elad Yom-Tov and Fernando Diaz won an honorable mention last week at SIGIR 2011 in Beijing.

This paper examines the effect of social and physical distance on queries for events.  The authors examined three cases: the San Bruno, CA gas line explosion of 9/9/2010, a violent storm in New York City that took place on 9/16/2010, and the 2010 Senate election in Alaska.  Data came from a query log of Yahoo! users.  Physical distance was computed using the profile zipcode of each user, and social distance was computed using the zip codes of instant messenger buddies.

The paper finds, perhaps not surprisingly, that the volume of queries related to each event decreases with distance and time from the event, and that searchers with closer geographical and social ties both query more, and query for different kinds of information.

I especially like this view of social search, and of local vs. general-informational search.  I would love to see work on this covering more events and events with a wider range of impact.  The selected events were ones that have a local focus but which achieved nationwide attention in the United States.  It would be interesting to look at events with a longer time-frame (trials, evolving events) and to consider if there are events that have a larger impact socially than geographically.

 author = {Yom-Tov, Elad and Diaz, Fernando},
 title = {Out of sight, not out of mind: on the effect of social 
          and physical detachment on information need},
 booktitle = {Proceedings of the 34th international ACM SIGIR conference 
              on Research and development in Information},
 series = {SIGIR '11},
 year = {2011},
 isbn = {978-1-4503-0757-4},
 location = {Beijing, China},
 pages = {385--394},
 numpages = {10},
 url = {},
 doi = {},
 acmid = {2009970},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {distance, information, need, physical, social},

SIGIR 2011 Previews

June 1, 2011 Leave a comment

Accepted SIGIR 2011 authors have started posting their papers online, so I have decided to collect them here.  Please comment on the ones you think are the most interesting and important.  As I get more papers, I’ll edit this post to add them. Nima Asadi, Don Metzler, Tamer Elsayed, and Jimmy Lin, “Pseudo Test Collections for Learning Web Search Ranking Functions”

Evangelos Kanoulas, Ben Carterette, Paul D. Clough, and Mark Sanderson, “Evaluating Multi-Query Sessions”

Michael Bendersky, Don Metzler, and Bruce Croft, “Parameterized Concept Weighting in Verbose Queries” Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis, “Intent-Aware Search Result Diversification” Elad Yom-Tov, Fernando Diaz, “Out of sight, not out of mind: On the effect of social and physical detachment on information need” Tetsuya Sakai and Ruihua Song, “Evaluating Diversified Search Results Using Per-intent Graded Relevance” Ferhan Ture, Tamer Elsayed, and Jimmy Lin. “No Free Lunch: Brute Force vs. Locality-Sensitive Hashing for Cross-lingual Pairwise Similarity” Ben Carterette, “System Effectiveness, User Models, and User Utility: A Conceptual Framework for Investigation” Manos Tsagkias, Maarten de Rijke, and Wouter Weerkamp, “Hypergeometric Language Models for Republished Article Finding” Wouter Weerkamp, Bogomil Kovachev, Richard Berendsen, Edgar Meij, Krisztian Balog, and Maarten de Rijke, “People Searching for People: Analysis of a People Search Engine Log” Daveid Elsweiler, Morgan Harvey, Martin Hacker, “Understanding Re-finding behavior in Naturalistic Email Interaction Logs” David Elsweiler, David E. Losada, José Carlos Toucedo, Ronald T. Fernández, “Seeding Simulated Queries with User-study Data for Personal Search Evaluation” Aleksander Stupar and Sebastian Michel, “PICASSO – To Sing you must Close Your Eyes and Draw” Avishek Anand, Srikanta Bedathur, Klaus Berberich, Ralf Schenkel, “Temporal Index Sharding for Space-Time Efficiency in Archive Search”

Categories: Conference Tags:

Vote for the best CIKM 2010 papers

October 26, 2010 1 comment

CIKM 2010 is taking place right now in Toronto, Ontario, Canada.  I have collected all the papers in the IR track below.  Please take a moment to vote for those papers you think are the most provocative, innovative, startling, or otherwise interesting.  You can vote for up to 3 papers.

I had to make a hard decision here to just do the IR track.  CIKM has four tracks — IR, DB, KM, and industry, and there are entire sessions in the other tracks that are of interest to the IR community.  Mostly I decided to constrain this poll to the IR track to keep the size reasonable.  There may be an additional poll for DB/KM/IND papers if the comments go that way.

After the conference, the Not Relevant editorial board will choose several papers based on the votes here, and post summaries of those papers.  The end result should be a must-read guide to CIKM 2010.

Categories: Meta, Review Tags: ,

Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets

May 12, 2010 7 comments

Gordon V. Cormack, Mark D. Smucker, and Charles L. A. Clarke
University of Waterloo

The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general Web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam — pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset.

We show that a simple content-based classifier with minimal training is efficient enough to rank the “spamminess” of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision (estP10) as well as rank measures (estR Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of “honeypot” queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering — from among the worst to among the best.

Get the paper at

Read more…

News update, copyright guidance

April 30, 2010 2 comments

Now that we are past the SIGIR camera-ready deadline, I posted a call for submissions to IRlist.  We now have two submissions, and I hope we can post them next week.

I was asked by a potential author about copyright conflicts.  I corresponded with ACM and received the following guidance.  When you have a paper accepted to an ACM conference, you assign copyright to the ACM and retain some rights.  Those rights are detailed here.  I quote part of it here:

Under the ACM copyright transfer agreement, the original copyright holder retains:  …

  • the right to reuse any portion of the work, without fee, in future works of the author’s own, including books, lectures and presentations in all media, provided that the ACM citation and notice of the Copyright are included
  • the right to revise the work (See §2.4 Definitive Versions and Revisions), …

Authors may post works on public repositories before acceptance but must incorporate the ACM copyright notice upon transfer of copyright.
After acceptance, authors may post the work on public repositories only with the explicit permission of ACM.

I am not a lawyer, and for definitive opinions you should contact the ACM.  To me, it seems that you should either (a) post your paper as submitted to before you receive your reviews (say, the day before notification, so blinding remains effective), then update with your reviewer revisions and the ACM copyright notice after your acceptance, (b) revise the work after acceptance, perhaps by including further experiments that didn’t fit in the original paper, or (c) get permission from ACM to put the accepted paper in

Another option is not to transfer copyright to the ACM.  I personally don’t do this, because I think as a US Gov’t employee my work isn’t under copyright, so I have none to assign; the ACM has a special permission form for us Gov’t types because of this.  The all-around better solution for everyone is the CC license that uses, where you keep copyright but grant liberal usage.  I’m not sure what would happen if we all decided to dicker the copyright assignment form with ACM… maybe good things, maybe not.

The ACM of course is not the only collector of copyrights, but it’s the one I’ve covered here.  I recommend reading things before you sign them, and if you don’t agree, argue.

Categories: Meta Tags:

The gates are now open!

April 16, 2010 Leave a comment

I’m pleased to announce that we are ready to open the gates for submissions.

We’ve updated the editorial guidelines (see the link on the side) to include submission guidelines and instructions.  The “how to” is pretty simple: for preprints, surveys, and reviews, you put your paper into and fill out the submission form (also linked on the side).  For technical correspondence, either submit plain text using the form, or send us a PDF.

The “what to submit” has changed a bit following a couple weeks of intense discussion among the editorial board and other friends.  Our central goal is to act as a rapid dissemination and discussion site for information retrieval research.  Therefore, our main thrust is preprints – approved preprints will be blogged here for comment and discussion.  Surveys and technical reviews also fit the paradigm.

Another forum missing in the IR community is technical correspondence.  Where can you write a detailed critique of something (say, pooling), and have a thoughtful written discussion on it?  Up to now, you could send it to a conference (long wait, eventual conversation) or to a journal (longer wait, no conversation), or post it on the net and hope.  We think that by making a central forum for this, we can drop the “hope” from the process.

We are still looking for editorial board members in several areas.  If you think you want to be a part of Not Relevant, drop me a line.

So, we’re ready!  Send us your work and get it in front of the research community, now.

Categories: Meta Tags: