SWIRL ’12 looks at The Future of IR

January 10, 2012 Leave a comment

SWIRL (http://www.cs.rmit.edu.au/swirl12/index.php) is a workshop series that aims to explore the long-range issues in information retrieval.  The 2012 meeting will take place next month in Lorne, Victoria, Australia.

Participants were asked to nominate three papers that “represent important new directions, research areas, or results in the IR field.”  Since highlighting the bleeding (and soon-to-be-bleeding) edge is a goal of NR, I decided to mirror the list of nominated papers below.

(Original list is at http://www.cs.rmit.edu.au/swirl12/proceedings.php)

What do you think of this list?  Are there other papers you feel should be on this list? (Undoubtedly there are, as this is the expressly limited input of a few participants!)  What do you think is the research that has the strongest implications for IR research for the next few years?

Categories: Conference, Survey Tags: , ,

SIGIR 2011 Highlight: “Out of Sight, Not Out of Mind”

August 1, 2011 Leave a comment

A “highlights” post is meant to call attention to a paper which grabbed our attention and we think is worth your time.  This paper, “Out of Sight, Not Out of Mind: On the Effect of Social and Physical Detachment on Information Need” by Elad Yom-Tov and Fernando Diaz won an honorable mention last week at SIGIR 2011 in Beijing.

This paper examines the effect of social and physical distance on queries for events.  The authors examined three cases: the San Bruno, CA gas line explosion of 9/9/2010, a violent storm in New York City that took place on 9/16/2010, and the 2010 Senate election in Alaska.  Data came from a query log of Yahoo! users.  Physical distance was computed using the profile zipcode of each user, and social distance was computed using the zip codes of instant messenger buddies.

The paper finds, perhaps not surprisingly, that the volume of queries related to each event decreases with distance and time from the event, and that searchers with closer geographical and social ties both query more, and query for different kinds of information.

I especially like this view of social search, and of local vs. general-informational search.  I would love to see work on this covering more events and events with a wider range of impact.  The selected events were ones that have a local focus but which achieved nationwide attention in the United States.  It would be interesting to look at events with a longer time-frame (trials, evolving events) and to consider if there are events that have a larger impact socially than geographically.

@inproceedings{Yom-Tov:2011:OSO:2009916.2009970,
 author = {Yom-Tov, Elad and Diaz, Fernando},
 title = {Out of sight, not out of mind: on the effect of social 
          and physical detachment on information need},
 booktitle = {Proceedings of the 34th international ACM SIGIR conference 
              on Research and development in Information},
 series = {SIGIR '11},
 year = {2011},
 isbn = {978-1-4503-0757-4},
 location = {Beijing, China},
 pages = {385--394},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2009916.2009970},
 doi = {http://doi.acm.org/10.1145/2009916.2009970},
 acmid = {2009970},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {distance, information, need, physical, social},
}

SIGIR 2011 Previews

June 1, 2011 Leave a comment

Accepted SIGIR 2011 authors have started posting their papers online, so I have decided to collect them here.  Please comment on the ones you think are the most interesting and important.  As I get more papers, I’ll edit this post to add them.

http://bit.ly/iYr7HB Nima Asadi, Don Metzler, Tamer Elsayed, and Jimmy Lin, “Pseudo Test Collections for Learning Web Search Ranking Functions”

Evangelos Kanoulas, Ben Carterette, Paul D. Clough, and Mark Sanderson, “Evaluating Multi-Query Sessions”

Michael Bendersky, Don Metzler, and Bruce Croft, “Parameterized Concept Weighting in Verbose Queries”

http://bit.ly/iAel3e Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis, “Intent-Aware Search Result Diversification”

http://bit.ly/iTiNhJ Elad Yom-Tov, Fernando Diaz, “Out of sight, not out of mind: On the effect of social and physical detachment on information need”

http://bit.ly/jxV5BV Tetsuya Sakai and Ruihua Song, “Evaluating Diversified Search Results Using Per-intent Graded Relevance”

http://bit.ly/jTrdQD Ferhan Ture, Tamer Elsayed, and Jimmy Lin. “No Free Lunch: Brute Force vs. Locality-Sensitive Hashing for Cross-lingual Pairwise Similarity”

http://bit.ly/loeoAh Ben Carterette, “System Effectiveness, User Models, and User Utility: A Conceptual Framework for Investigation”

http://bit.ly/jH2vNX Manos Tsagkias, Maarten de Rijke, and Wouter Weerkamp, “Hypergeometric Language Models for Republished Article Finding”

http://bit.ly/kVfJpQ Wouter Weerkamp, Bogomil Kovachev, Richard Berendsen, Edgar Meij, Krisztian Balog, and Maarten de Rijke, “People Searching for People: Analysis of a People Search Engine Log”

http://bit.ly/iX0ZZj Daveid Elsweiler, Morgan Harvey, Martin Hacker, “Understanding Re-finding behavior in Naturalistic Email Interaction Logs”

http://bit.ly/loqQ57 David Elsweiler, David E. Losada, José Carlos Toucedo, Ronald T. Fernández, “Seeding Simulated Queries with User-study Data for Personal Search Evaluation”

http://tinyurl.com/3bsxql5 Aleksander Stupar and Sebastian Michel, “PICASSO – To Sing you must Close Your Eyes and Draw”

http://bit.ly/kYjT39 Avishek Anand, Srikanta Bedathur, Klaus Berberich, Ralf Schenkel, “Temporal Index Sharding for Space-Time Efficiency in Archive Search”

Categories: Conference Tags:

Vote for the best CIKM 2010 papers

October 26, 2010 1 comment

CIKM 2010 is taking place right now in Toronto, Ontario, Canada.  I have collected all the papers in the IR track below.  Please take a moment to vote for those papers you think are the most provocative, innovative, startling, or otherwise interesting.  You can vote for up to 3 papers.

I had to make a hard decision here to just do the IR track.  CIKM has four tracks — IR, DB, KM, and industry, and there are entire sessions in the other tracks that are of interest to the IR community.  Mostly I decided to constrain this poll to the IR track to keep the size reasonable.  There may be an additional poll for DB/KM/IND papers if the comments go that way.

After the conference, the Not Relevant editorial board will choose several papers based on the votes here, and post summaries of those papers.  The end result should be a must-read guide to CIKM 2010.

Categories: Meta, Review Tags: ,

Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets

May 12, 2010 7 comments

Gordon V. Cormack, Mark D. Smucker, and Charles L. A. Clarke
University of Waterloo

The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general Web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam — pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset.

We show that a simple content-based classifier with minimal training is efficient enough to rank the “spamminess” of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision (estP10) as well as rank measures (estR Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of “honeypot” queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering — from among the worst to among the best.

Get the paper at arxiv.org

Read more…

News update, copyright guidance

April 30, 2010 2 comments

Now that we are past the SIGIR camera-ready deadline, I posted a call for submissions to IRlist.  We now have two submissions, and I hope we can post them next week.

I was asked by a potential author about copyright conflicts.  I corresponded with ACM and received the following guidance.  When you have a paper accepted to an ACM conference, you assign copyright to the ACM and retain some rights.  Those rights are detailed here.  I quote part of it here:

Under the ACM copyright transfer agreement, the original copyright holder retains:  …

  • the right to reuse any portion of the work, without fee, in future works of the author’s own, including books, lectures and presentations in all media, provided that the ACM citation and notice of the Copyright are included
  • the right to revise the work (See §2.4 Definitive Versions and Revisions), …

Authors may post works on public repositories before acceptance but must incorporate the ACM copyright notice upon transfer of copyright.
After acceptance, authors may post the work on public repositories only with the explicit permission of ACM.

I am not a lawyer, and for definitive opinions you should contact the ACM.  To me, it seems that you should either (a) post your paper as submitted to arxiv.org before you receive your reviews (say, the day before notification, so blinding remains effective), then update with your reviewer revisions and the ACM copyright notice after your acceptance, (b) revise the work after acceptance, perhaps by including further experiments that didn’t fit in the original paper, or (c) get permission from ACM to put the accepted paper in arxiv.org.

Another option is not to transfer copyright to the ACM.  I personally don’t do this, because I think as a US Gov’t employee my work isn’t under copyright, so I have none to assign; the ACM has a special permission form for us Gov’t types because of this.  The all-around better solution for everyone is the CC license that arxiv.org uses, where you keep copyright but grant liberal usage.  I’m not sure what would happen if we all decided to dicker the copyright assignment form with ACM… maybe good things, maybe not.

The ACM of course is not the only collector of copyrights, but it’s the one I’ve covered here.  I recommend reading things before you sign them, and if you don’t agree, argue.

Categories: Meta Tags:

The gates are now open!

April 16, 2010 Leave a comment

I’m pleased to announce that we are ready to open the gates for submissions.

We’ve updated the editorial guidelines (see the link on the side) to include submission guidelines and instructions.  The “how to” is pretty simple: for preprints, surveys, and reviews, you put your paper into arxiv.org and fill out the submission form (also linked on the side).  For technical correspondence, either submit plain text using the form, or send us a PDF.

The “what to submit” has changed a bit following a couple weeks of intense discussion among the editorial board and other friends.  Our central goal is to act as a rapid dissemination and discussion site for information retrieval research.  Therefore, our main thrust is preprints – approved preprints will be blogged here for comment and discussion.  Surveys and technical reviews also fit the arxiv.org paradigm.

Another forum missing in the IR community is technical correspondence.  Where can you write a detailed critique of something (say, pooling), and have a thoughtful written discussion on it?  Up to now, you could send it to a conference (long wait, eventual conversation) or to a journal (longer wait, no conversation), or post it on the net and hope.  We think that by making a central forum for this, we can drop the “hope” from the process.

We are still looking for editorial board members in several areas.  If you think you want to be a part of Not Relevant, drop me a line.

So, we’re ready!  Send us your work and get it in front of the research community, now.

Categories: Meta Tags:

Board, correspondence, errata, etc.

March 31, 2010 1 comment

Things have been moving quickly over the past few days.  We’ve added more members to the editorial board, and the board has been discussing the form this journal should take.

I’ve set things up to publish posts in the following categories:

  • Research papers, as you might expect.  We also have a separate section for short research papers.
  • Letters.  This is technical correspondence and discussion.  The IR community doesn’t have a solid forum for this at the moment.
  • Errata and corrigenda.  Another missing feature in the IR community.
  • Reviews of books and conferences.  SIGIR Forum has this as well, but it would be nice if it were more timely.
  • Surveys.  Everyone’s doing these nowadays (sometimes calling them books), we’ll see where it goes.
  • Meta.  You’re probably tired of those by now.

Another idea we’ve been discussing is rapid release of in-progress research papers.  The ordinary case for a research paper is that after submission, the board solicits reviews, and makes a publication decision based on those.  We intend a quicker turnaround than a traditional journal but it’s not quite web speed.

So alternatively authors could elect to have their submission posted while it is still in review.  The editorial board would decide that the paper is within scope enough that we would publish it if the reviews recommended it, then we post it right up on the blog in an “In progress” section.  The paper gets reviewed as above, but at the same time, the in-review version is available, folks can download it, read it, and comment on it.  Authors would want to do this for the quick feedback and community discussion.  And we want it because it adds to the review process.  There are still a few issues to be hammered out but I’m hoping this will go live soon.

Categories: Meta Tags: ,

Welcome, a little more seriously

March 26, 2010 6 comments

One or two people asked me if I am serious, and I guess the initial post was a bit silly in tone.  So let’s try that again.

Welcome to Not Relevant, a new electronic journal for information retrieval research.  We want to publish bleeding-edge research papers, especially those that current IR conferences are rejecting due to an apparent lack of vision.  We will also publish letters, short research papers, reviews, and surveys.

Our editorial policies are being developed right now.  Here’s the prototype framework.  The goal is to publish solid advances in the state of the art, game-changing work, research that pushes the boundaries of IR.  We are not looking for incremental improvements.

Authors submit an article via email, and the editorial board solicits reviews from the information retrieval community.  These reviews are single-blind and confidential.  An editorial board member is appointed as the shepherd for the paper.  Once reviews are complete, the editorial board member writes an open (not blind) review summarizing the reviews.  If the paper is accepted, it is published on the blog along with the editorial review.

All published articles are open – there are no restrictions on access to articles aside from the authors’ copyright, which they hold; Not Relevant is not owning anyone’s content but our own.  Articles will be posted to the blog – the blog post is the abstract, and a link to the article hosted here.  Posts will allow comments and ratings.  We will monitor comments and only moderate as necessary.  Publication is on a rolling basis, and we’ll come up with a reliable citation scheme.

Is this a real journal?  Yep.  It’s brand new and has no track record, but the editorial board are all very qualified researchers.  We do not plan to publish substandard papers, and we hope you’ll help us grow the reputation that makes a good journal.

You can help by commenting to the discussion here, emailing me, sending us papers, and/or volunteering to help either as a reviewer or an editorial board member.

Categories: Meta Tags: ,

Welcome to Not Relevant

March 25, 2010 8 comments

This blog was born out of reading complaints from lots of people that their SIGIR paper was rejected.  Ok, we’ve all been there, most of it is probably sour grapes but surely there are some gems there (including yours, my friend!).

So your paper got rejected with three two-liner reviews and a metareview that reads (summary) “Zzz. uh wah?”  What can you do?  You have three traditional choices:

  1. Send it off to the next conference (with optional revisions)
  2. Send it off to a journal (with optional revisions)
  3. Cry in your beer

But now you have a fourth option.  Send it to Not Relevant.  Our crack team of experienced IR researchers will review your paper (probably for the second time) and attempt to guide it towards publication here.  Think of us as an online journal for papers that the conference just doesn’t seem to get.

We aren’t agreeing to publish everything we get.  We are trying to find the discarded diamonds, the wheat in the chaff.  Everyone knows that conference review processes are fallible, and we want to be the sieve that holds the great stuff that they missed.

The editorial board and reviewing criteria are in process.  Stay tuned, or contact us if you want to be involved.

Categories: Meta Tags: ,