Campus News
Marketing and Communications

 


 

 

Scientists learn from stats forum
by Susan Griffith

Making sense of millions of pieces of information can daunt scientists. In their call for help June 2-4 during the conference, "Developments and Challenges in Mixture Models, Bump Hunting and Measurement Error Models" at CWRU, scientists caught a lifelines from statisticians in learning about new statistical techniques for data mining. They discussed the challenges and exchanged new ideas.

The event attracted more than 80 participants from the physical sciences, bioinformatics, engineering, information technology and statistics, who had the opportunity in the unique gathering of scientists and statisticians to exchange problems with handling scientific information and hearing statistical solutions from leading statisticians.

"The conference was not huge, but it consisted of many elite statisticians and top researchers and scientists who work with data, which is everywhere in our lives," says Jiayang Sun, CWRU professor of statistics and conference organizer.

The conference was co-sponsored by the CWRU department of statistics and Hement Ishwaran from the Cleveland Clinic Foundation's department of epidemiology and biostatics, with support from the National Science Foundation, the Institute of Mathematical Statistics, CWRU, the Cleveland Clinic and the American Statistical Association's Cleveland Chapter.

Sun adds, "The complexity and size of data information have increased rapidly. The challenge scientists face calls for realistic and efficient procedures to deal with the massive and varied amounts of information."

At the heart of this information boom is the computer.

"With the advent of massive computer power readily available on people's desks, the frontiers of statistics also have moved tremendously fast in the past decade," says Peter Hall, the conference's keynote speaker and a statistician from Australian National University.

"While many people still associate statistics with mathematics, it is its own science," says Sun, where the computer has become the experimental lab for testing new statistical methods.

Hall adds, "The scientists were surprised to see the array of new methods for today's statistics and those under development for tomorrow."

Clustering and bump hunting are two methods to find groups of data that show similar combinations to their features or stand out from other information. Other techniques, such as mixture models are useful for modeling complex phenomena and measurement error models can effectively deal with imperfect measurements.

Hall points out that clustering particularly interested astronomers, who have created virtual observatories with telescopes controlled by computers that record enormous amounts of information from radiation sources in the universe.

He also explained that doctors and biologists recognized that clustering is a tool that can be used to read and to decipher the genetic code and then relate patterns in the code to certain diseases.

While scientists might work with volumes of numbers, the Internet and email has posed problems for text mining. Participants heard from Regina Y. Liu from Rutgers University who is working with the Federal Aviation Administration to track regulatory compliance. "Properly analyzing these data can help the FAA identify the areas of greater risk, and oversee more effectively aviation operations and safety activities," according to Liu.

She discussed new statistical applications that mined textual data from millions of reports about inspections and findings and also pointed out how text classification procedures "can be a critical element of the aviation safety decision support system."

Mark Hansen from Bell Labs talked about the challenges of study human generated data, such as email and webpage information.

"The fruits of this conference are that scientists went away with exposure to the problems that other disciplines are working on and the difficulties these individuals have. They now have a source of contacts," says Hall.

"As statisticians, we had the opportunity to access real challenges out there and interact with scientists and stimulate new statistics research," says Sun.

She adds that this is "a new era that needs collaborative effort for modern scientific advances. Bumps, components, clusters and atypical structures from real data often lead to scientific discoveries or real interesting phenomena of a population. They are important in astronomy, biology, data mining, bioinformatics and in applications to virtually all natural and social sciences."

Return to the online edition of the 6-20 Campus News.

 

.
Legal Information | © 2003 Case Western Reserve University | Contact the Department
This page last updated on: Thursday, 02-Dec-2004 12:27:44 EST