The big data challenge
|LSU embarks on a $1 million quest to network its researchers to address issues ranging from genetics to disaster that could have implications for Louisiana business.|
From engineering to coastal studies, top researchers at LSU easily share a campus. But their ability to share data across cyberspace is another story.
The data sets they collect, analyze and store in the course of their research are known as "big data." They are so large and detailed that they cannot be processed, searched and analyzed using traditional data management and analytics tools.
This data is often unstructured, so it is not formatted in uniform, predefined ways. As a result, researchers cannot readily share data, even though information that could prove useful for researchers in one discipline may lie within another discipline's data.
LSU researchers are looking for a solution to this big-data problem. They recently received a grant of almost $1 million from the National Science Foundation for a campus-wide project aimed at bringing big data computation capabilities to research groups across LSU.
The project—entitled "CC-NIE Integration: Bridging, Transferring and Analyzing Big Data over 10Gbps Campus-Wide Software Defined Networks"—is led by Seung-Jong Park, an associate professor of computer science with joint appointment in LSU's Center for Computation & Technology, and his co-investigators: Gus Kousoulas, Lonnie Leger, Sean Robbins and Joel Tohline.
"Everyone is talking about big data—about how to collect, store and analyze vast amounts of information," says Kousoulas, who is associate vice chancellor for research and economic development and has joint appointments in the Department of Pathobiological Sciences in the School of Veterinary Medicine and in the School of Animal Science, LSU Agricultural Experiment Station, as well as adjunct appointments at the LSU Health Sciences Center in New Orleans. "The biggest challenges and breakthroughs of the 21st century will be derived from handling big data."
Big data permeates modern life. "We're accumulating more data now than in all of human history, and it's compounding at incredible rates," says Leger, director of networking at the Louisiana Optical Network Initiative. Big data is generated and collected from an ever-increasing number of sources—whether it be as mundane as mobile phones and their apps or credit card transactions, or as scientifically significant as the Large Hadron Collider in Switzerland.
The data is then analyzed to determine solutions for or make decisions about areas from marketing to transportation, shipping to social networking, and health care to genetics. For example, at LSU, researchers are working on big data–related applications such as human and animal genome sequencing and coastal hazard simulation that predicts flooding levels before a storm hits.
Park says the NSF-funded big-data project's title reflects three key concepts: bridging, transferring and analyzing. "At LSU, we've got a high number of supercomputers that process data, but they use traditional applications, which are not adapted for big data," he says. The grant will help change this by enabling Park and his co-investigators to construct the needed cyberinfrastructure, including hardware and software, to allow for better use and connectivity at LSU.
"Accomplishing the goal of big data involves building a new high-speed network for LSU, developing software to transfer big data from one supercomputer to another, and then analyzing the data" using data-intensive distributed computer frameworks, something that is not currently possible, Park says.
According to Tohline, director of CCT and professor of computational methods in the Department of Physics and Astronomy, with data easily running into the tens if not hundreds of terabytes, storage is critical to the project's success.
"In the past decade, we've needed a lot of computing capabilities, but we haven't had to pay a lot of attention to storage or curating the data. That's rapidly changing," he says. "We haven't had storage where we need it, so we can't move the big data. This award from NSF funds a project to show how a university might pull in that storage piece and fluidly tie it into the network."
LSU's scientific big-data problems such as genome sequence analysis require not only data capabilities but also computing capabilities far beyond terabytes. Big data is processed via computer memory rather than hard disk. Here's how that compares to your own technology needs at home or business, according to Seung-Jong Park, an associate professor of computer science at LSU with joint appointment in the Center for Computation & Technology.
1 CPU with 8 gigabytes of main memory
2 CPUs with 1 terabyte of main memory
700 CPUs and 12 terabytes of total memory that can be used for big-data research
The big-data project
The goal is to accelerate the speed of big data analysis up to 100-fold compared with 1 or 2 CPUs. To do that, researchers are developing a Hadoop-based distributed software over the supercomputer to run hundreds of compute nodes in parallel.
By providing advanced information technologies and cyberinfrastructure, the grant will have a significant impact on LSU's research capabilities.
"Say, for example, that researchers in the chemistry department have some data and they store it on their computer so they can pull parts of it for use in their research," says Robbins, director of NI (networking infrastucture) engineering at LSU's Information Technology Services. "They might have something in that data that would benefit other researchers, but they don't know it. And those other researchers don't know to go look at and mine the chemistry data. This project will give people a mechanism to mine that data for their own research, which greatly enhances the possibilities of answers for their research."
Yet given the volume of data being captured, finding that one nugget of correlating data "is like finding a needle in a haystack," Kousoulas says. "You may be looking for a single mutation among billions of patients." However, as Robbins observes, "That one record of information in a database may be what makes a difference. This could be how we do research in the future."
According to Kousoulas, LSU is at the forefront of such efforts. "The LSU System is taking the lead right now by proposing to NIH integration of all Louisiana-based medical databases that can be 'mined' by big data analytics with expected benefits to both health delivery and outcomes," he says.
The LSU faculty involved in the project reflect the multidisciplinary impacts of big data. Says Tohline: "We're looking at big data from a global perspective—not just one discipline."
Robbins notes that a multidisciplinary approach changes the amount of information that any single researcher has access to. "It exponentially expands the horizons of collaborative efforts," he says.
The results, Park says, can mean new discoveries and research programs delivered more quickly. "We can actually enable several areas of research with our framework—that's the beauty of this project," he says.
The effects of big-data research will be felt well beyond the LSU campus into big business. "The airline industry, companies like Google or Walmart—they are all trying to move data around," Tohline says. "The problem is much bigger than us. It extends into private industry and government." Indeed, Samsung Electronics is participating in the LSU project as an industrial collaborator.
The overall big data effort could also affect economic development in Baton Rouge, as well as throughout Louisiana. "This NSF cyberinfrastructure grant allows us to create a model for a much larger network that will enable the LSU System and, hopefully, the state of Louisiana, to be a player in big data," Kousoulas says. "It gives us the necessary tools and manpower to give the state of Louisiana a competitive advantage, and will help LSU increase its competitiveness and development of its computation resources."
Leger agrees. "I think we'll be delivering a better product in the knowledge economy. We'll be better at research and solving problems with a multidisciplinary and multilocational approach," he says. "As a result, we're going to grow and birth ideas in innovation that we didn't know we were going to discover."
For Robbins, that's the most exciting aspect of LSU's big-data initiative.
"This research we're doing at LSU could have really tangible benefits to humanity," he says. "Knowing that the place where I love to work could impact humanity for the better sheds light on people's research and explains why we're all so passionate about it."
Big data is information that is too large, complex and dynamic for any conventional data tools to capture, store, manage and analyze. If harnessed, it allows analysts to spot trends and gives niche insights that help create value and innovation much faster than conventional methods, benefitting both business and consumers. Big data is typically defined by four categories: value, volume, velocity and variety.
comments powered by Disqus
Preston Armstrong Qvistgaard=Petersen
Will EPA Kill Texas’ Energy Revolution?