LBSC 878, Spring 2005, Week 1, Doug Oard
Overview
Course Objectives and Approach
Thinking
broadly about “ISAR”
- Data-information-knowledge-wisdom
- Analysis-retrieval-synthesis-creation
- Content-system-process-purpose
- Technology-organization-access-use-context
Some terminology
- Data: Basic
elements of meaning (assertions)
- Information: Assertions together with the context
needed for their interpretation
- Knowledge: A basis for making decisions (Wilson’s
"practical relevance")
- Wisdom: A basis for guiding decisions
Defining IR (Blair)
- Both Information Retrieval
and Database Retrieval use queries to obtain information
- Database Retrieval
retrieves data and combines it to produce information.
- The query provides
the context needed for interpretation of the retrieval result
- Information Retrieval
retrieves objects (e.g., documents) that contain information
- The objects
themselves provide the context needed for their interpretation
Some examples of Information Retrieval (IR) applications
- Find something (an academic
paper, an email I wrote two years ago, some class notes, ...) in a
collection of written text, in any natural language, in any format (ASCII
text, word processor files, scanned images of typeset pages, scanned
images of handwritten manuscripts, ...)
- Find something (an object,
the work of a particular artist, a street address, ...) in a collection of
still images of some type (photographs, blueprints, oil paintings, maps,
...)
- Find something (spoken words,
singing, instrumental music, ...) in a collection of recorded audio
- Find something (a person, the
depiction of some event, ...) in a collection of
video (video tape, motion picture film, ...) that may or may not also
contain synchronized audio and text.
- Find someone (coworker,
consultant, speaker, ...) with the expertise that
is required to help you accomplish some goal.
- Sift through a large stream
of continuously generated materials (newswire stories, electronic mail,
television programs, telephone calls, ...) to find something worthy of
your attention
- Explore a large collection of
materials (documents, images, audio, video, ...) to identify some useful
information (broad trends, new discoveries, unexpected events, ...)
Why is IR hard? (Blair)
- An indexer must try to guess
which terms every searcher will use to look for each document
- A searcher must guess which
terms the indexer chose
- For full text retrieval, the
searcher must guess which terms the author chose
- Individual variability makes
it impossible to do any of this perfectly
- Retrieval effectiveness is
thus a matter of degree, rather than an absolute
- Effective retrieval thus
becomes an iterative process
- Techniques are known to help
the user choose good search terms (Croft’s "magic")
- But they tend to make
the system less predictable, so it becomes harder to iterate
- With well designed systems,
it is often possible to eventually find useful documents
- But there are
fundamental limits on what you can know about what you missed
Desiderata (Croft, roughly in order)
- Integration with non-IR
systems
- Source selection
- Result merging
- Response time
- Handling multiple document
formats
- Overcoming vocabulary
mismatch
- Usability
- Selective dissemination
- Robustness
- Multiple modalities
- Rapid configuration for
information extraction
- Predictability
- Handling documents written in
multiple languages
- Data mining in text databases
- Text categorization
Meta-questions
- How do articles get
published in D-Lib?
- Why did Blair write a book?
Focus Area Selection