|
Making sense of millions of pieces of information
can daunt scientists. In their call for help June 2-4 during the
conference, "Developments and Challenges in Mixture Models, Bump
Hunting and Measurement Error Models" at CWRU, scientists caught
a lifelines from statisticians in learning about new statistical
techniques for data mining. They discussed the challenges and
exchanged new ideas.
The event attracted more than 80 participants
from the physical sciences, bioinformatics, engineering, information
technology and statistics, who had the opportunity in the unique
gathering of scientists and statisticians to exchange problems
with handling scientific information and hearing statistical solutions
from leading statisticians.
"The conference was not huge, but it consisted
of many elite statisticians and top researchers and scientists
who work with data, which is everywhere in our lives," says Jiayang
Sun, CWRU professor of statistics and conference organizer.
The conference was co-sponsored by the
CWRU department of statistics and Hement Ishwaran from the Cleveland
Clinic Foundation's department of epidemiology and biostatics,
with support from the National Science Foundation, the Institute
of Mathematical Statistics, CWRU, the Cleveland Clinic and the
American Statistical Association's Cleveland Chapter.
Sun adds, "The complexity and size of data
information have increased rapidly. The challenge scientists face
calls for realistic and efficient procedures to deal with the
massive and varied amounts of information."
At the heart of this information boom is
the computer.
"With the advent of massive computer power
readily available on people's desks, the frontiers of statistics
also have moved tremendously fast in the past decade," says Peter
Hall, the conference's keynote speaker and a statistician from
Australian National University.
"While many people still associate statistics
with mathematics, it is its own science," says Sun, where the
computer has become the experimental lab for testing new statistical
methods.
Hall adds, "The scientists were surprised
to see the array of new methods for today's statistics and those
under development for tomorrow."
Clustering and bump hunting are two methods
to find groups of data that show similar combinations to their
features or stand out from other information. Other techniques,
such as mixture models are useful for modeling complex phenomena
and measurement error models can effectively deal with imperfect
measurements.
Hall points out that clustering particularly
interested astronomers, who have created virtual observatories
with telescopes controlled by computers that record enormous amounts
of information from radiation sources in the universe.
He also explained that doctors and biologists
recognized that clustering is a tool that can be used to read
and to decipher the genetic code and then relate patterns in the
code to certain diseases.
While scientists might work with volumes
of numbers, the Internet and email has posed problems for text
mining. Participants heard from Regina Y. Liu from Rutgers University
who is working with the Federal Aviation Administration to track
regulatory compliance. "Properly analyzing these data can help
the FAA identify the areas of greater risk, and oversee more effectively
aviation operations and safety activities," according to Liu.
She discussed new statistical applications
that mined textual data from millions of reports about inspections
and findings and also pointed out how text classification procedures
"can be a critical element of the aviation safety decision support
system."
Mark Hansen from Bell Labs talked about
the challenges of study human generated data, such as email and
webpage information.
"The fruits of this conference are that
scientists went away with exposure to the problems that other
disciplines are working on and the difficulties these individuals
have. They now have a source of contacts," says Hall.
"As statisticians, we had the opportunity
to access real challenges out there and interact with scientists
and stimulate new statistics research," says Sun.
She adds that this is "a new era that needs
collaborative effort for modern scientific advances. Bumps, components,
clusters and atypical structures from real data often lead to
scientific discoveries or real interesting phenomena of a population.
They are important in astronomy, biology, data mining, bioinformatics
and in applications to virtually all natural and social sciences."
|