Archive for the ‘information insight’ Category
Live @ Information On Demand 2012: A Q&A With Nate Silver On The Promise Of Prediction
Day 3 at Information On Demand 2012.
The suggestion to “Think Big” continued, so Scott Laningham and I sat down very early this morning with Nate Silver, blogger and author of the now New York Times bestseller, “The Signal and the Noise” (You can read the review of the book in the Times here).
Nate, who is a youngish 34, has become our leading statistician through his innovative analyses of political polling, but made his original name by building a widely acclaimed baseball statistical analysis system called “PECOTA.”
Today, Nate runs the award-winning political website FiveThirtyEight.com, which is now published in The New York Times and which has made Nate the public face of statistical analysis and political forecasting.
In his book, the full title of which is “The Signal and The Noise: Why Most Predictions Fail — But Some Don’t,” Silver explores how data-based predictions underpin a growing sector of critical fields, from political polling to weather forecasting to the stock market to chess to the war on terror.
In the book, Nate poses some key questions, including what kind of predictions can we trust, and are the “predicters” using reliable methods? Also, what sorts of things can, and cannot, be predicted?
In our conversation in the greenroom just prior to his keynote at Information On Demand 2012 earlier today, Scott and I probed along a number of these vectors, asking Nate about the importance of prediction in Big Data, statistical influence on sports and player predictions (a la “Moneyball”), how large organizations can improve their predictive capabilities, and much more.
It was a refreshing and eye-opening interview, and I hope you enjoy watching it as much as Scott and I enjoyed conducting it!
Big Study On Big Data
Perfect timing.
In advance of IBM’s massive event next week in Las Vegas featuring all things information management, Information On Demand 2012, IBM and the Saïd Business School at the University of Oxford today released a study on Big Data.

According to a new global report from IBM and the Said Business School at the University of Oxford, less than half of the organizations engaged in active Big Data initiatives are currently analyzing external sources of data, like social media.
The headline: Most Big Data initiatives currently being deployed by organizations are aimed at improving the customer experience, yet less than half of the organizations involved in active Big Data initiatives are currently collecting and analyzing external sources of data, like social media.
One reason: Many organizations are struggling to address and manage the uncertainty inherent within certain types of data, such as the weather, the economy, or the sentiment and truthfulness of people expressed on social networks.
Another? Social media and other external data sources are being underutilized due to the skills gap. Having the advanced capabilities required to analyze unstructured data — data that does not fit in traditional databases such as text, sensor data, geospatial data, audio, images and video — as well as streaming data remains a major challenge for most organizations.
The new report, entitled “Analytics: The real-world use of Big Data,” is based on a global survey of 1,144 business and IT professionals from 95 countries and 26 industries. The report provides a global snapshot of how organizations today view Big Data, how they are building essential capabilities to tackle Big Data and to what extent they are currently engaged in using Big Data to benefit their business.
Only 25 percent of the survey respondents say they have the required capabilities to analyze highly unstructured data — a major inhibitor to getting the most value from Big Data.
The increasing business opportunities and benefits of Big Data are clear. Nearly two-thirds (63 percent) of the survey respondents report that using information, including Big Data, and analytics is creating a competitive advantage for their organizations. This is a 70 percent increase from the 37 percent who cited a competitive advantage in a 2010 IBM study.
Big Data Drivers and Adoption
In addition to customer-centric outcomes, which half (49 percent) of the respondents identified as a top priority, early applications of Big Data are addressing other functional objectives.
Nearly one-fifth (18 percent) cited optimizing operations as a primary objective. Other Big Data applications are focused on risk and financial management (15 percent), enabling new business models (14 percent) and employee collaboration (4 percent).
Three-quarters (76 percent) of the respondents are currently engaged in Big Data development efforts, but the report confirms that the majority (47 percent) are still in the early planning stages.
However, 28 percent are developing pilot projects or have already implemented two or more Big Data solutions at scale. Nearly one quarter (24 percent) of the respondents have not initiated Big Data activities, and are still studying how Big Data will benefit their organizations.
Sources of Big Data
More than half of the survey respondents reported internal data as the primary source of Big Data within their organizations. This suggests that companies are taking a pragmatic approach to Big Data, and also that there is tremendous untapped value still locked away in these internal systems.
Internal data is the most mature, well-understood data available to organizations. The data has been collected, integrated, structured and standardized through years of enterprise resource planning, master data management, business intelligence and other related work.
By applying analytics, internal data extracted from customer transactions, interactions, events and emails can provide valuable insights.
Big Data Capabilities
Today, the majority of organizations engaged in Big Data activities start with analyzing structured data using core analytics capabilities, such as query and reporting (91 percent) and data mining (77 percent).
Two-thirds (67 percent) report using predictive modeling skills.
But Big Data also requires the capability to analyze semi-structured and unstructured data, including a variety of data types that may be entirely new for many organizations.
In more than half of the active Big Data efforts, respondents reported using advanced capabilities designed to analyze text in its natural state, such as the transcripts of call center conversations.
These analytics include the ability to interpret and understand the nuances of language, such as sentiment, slang and intentions. Such data can help companies, like a bank or telco provider, understand the current mood of a customer and gain valuable insights that can be immediately used to drive customer management strategies.
You can download and read the full study here.
Update: Also check out the new IBM Big Data Hub, a compendium of videos, blog posts, podcasts, white papers, and other useful assets centering on this big topic!
Look To The Heavens
If you’ve ever fancied yourself a sort of Walter Mitty-ish astronomer, you’re going to like this one.
IBM announced today that the Victoria University of Wellington, on behalf of the Murchison Widefield Array (MWA) Consortium, has selected IBM systems technology to help scientists probe the origins of the universe.
This effort is the result of an international collaboration between 13 institutions from Australia, New Zealand, U.S. and India. The MWA is a new type of radio telescope designed to capture low frequency radio waves from deep space as well as the volatile atmospheric conditions of the Sun.
The signals will be captured by the telescope’s 4,096 dipole antennas positioned in the Australian Outback in a continuous stream and processed by an IBM iDataPlex dx360 M3 computing cluster that will convert the radio waves into wide-field images of the sky that are unprecedented in clarity and detail.
The IBM iDataPlex cluster will replace MWA’s existing custom-made hardware systems and will enable greater flexibility and increased signal processing.
The cluster is expected to process approximately 50 terabytes of data per day at full data rate at a speed of 8 gigabytes per second, the equivalent to over 2,000 digital songs per second, allowing scientists to study more of the sky faster than ever before, and with greater detail.
The ultimate goal of this revolutionary $51 million MWA telescope is to observe the early universe, when stars and galaxies were first born.
By detecting and studying the weak radio signals emitted from when the universe consisted of only a dark void of hydrogen gas — the cosmic “dark age” — scientists hope to understand how stars, planets and galaxies were formed. The telescope will also be used by scientists to study the sun’s heliosphere during periods of strong solar activity and time-varying astronomical objects such as pulsars.
The IBM iDataPlex cluster will be housed on-site in the Murchison Radio Observatory (MRO) site around 700 km north of Perth, near the radio telescope antennas.
With a 10 Gbps communications link to Perth, it will allow the images to be transferred and stored and made available for research. The MRO site will also be the Australian location for a significant portion of the Square Kilometre Array (SKA), which will be the world’s most powerful radio telescope and is being co-hosted by Australia and South Africa.
The MWA project is led by the International Centre for Radio Astronomy Research at Curtin University and is one of three SKA precursor telescopes.
You can learn more about the MWA telescope here.
Live @ IBM Smarter Commerce Global Summit Madrid: IBM Product Manager Mark Frigon On Smarter Web Analytics & Privacy

Mark Frigon is a senior product manager with IBM’s Enterprise Marketing Management organization, a key group involved in leading IBM’s Smarter Commerce initiative. Mark’s specialties are in Web analytics (he joined IBM as part of its acquisition of Coremetrics) and Internet privacy, an issue that has come to the forefront in recent years for digital marketers around the globe.
Effective Web metrics are critical to the success of businesses looking to succeed in e-commerce and digital marketing these days, and IBM has a number of experts who spend a lot of their time in this area.
One of those here in Madrid at the IBM Smarter Commerce Global Summit, Mark Frigon, is a senior product manager for Web analytics in IBM’s Enterprise Marketing Management organization.
Mark sat down with me to discuss the changing nature of Web analytics, and how dramatically it has evolved as a discipline over the past few years, including the increased focus by marketers on “attribution,” the ability to directly correlate a Web marketing action and the desired result.
Mark also spoke at the event about the importance for digital marketers around the globe to be more privacy-aware, a topic we also discussed in our time together, calling out in particular the “Do-Not-Track” industry self-regulatory effort that intends to put privacy controls in the hands of consumers.
If you spend any time thinking about Internet privacy or Web analytics, or both, this is a conversation you won’t want to miss.
Out Of Africa: IBM And National Geographic Map The Human Genography
My father’s name is James Watson. But alas, he’s no relation to the co-discoverer of DNA.

Click to expand. Using new analytical capabilities, IBM and the Genographic Project have found new evidence to support a southern route of human migration from Africa before any movement heading north, suggesting a special role for south Asia in the "out of Africa" expansion of modern humans.
Just as I’m no relation to the founder of IBM, T.J. Watson. You’d be surprised how many people ask if I am related to the IBM Watsons. I politely explain that if I were, I’d probably be on a yacht in the Caribbean somewhere. And I still aspire to discover a long lost genetic connection.
But who I AM related to is part of the study that will be related today at the National Geographic Society in partnership with IBM’s Genographic Project.
Flash back a few million years.
Evolutionary history has demonstrated human populations likely originated in Africa. The Genographic Project, which is the most extensive survey of human population genetic data to date, suggests where they went next.
No, not to McDonalds. That came much later.
Rather, a study by the project finds that modern humans migrated out of Africa via a southern route through Arabia, rather than a northern route by way of Egypt.
It’s those findings that will be highlighted today at the National Geographic Society conference.
Mapping The Human Geno-ography
National Geographic and IBM’s Genographic Project scientific consortium have developed a new analytical method that traces the relationship between genetic sequences from patterns of recombination — the process by which molecules of DNA are broken up and recombine to form new pairs.
Ninety-nine percent of the human genome goes through this shuffling process as DNA is being transmitted from one generation to the next. These genomic regions have been largely unexplored to understand the history of human migration.
By looking at similarities in patterns of DNA recombination that have been passed on and in disparate populations, genographic scientists confirm that African populations are the most diverse on Earth, and that the diversity of lineages outside of Africa is a subset of that found on the continent.
The divergence of a common genetic history between populations showed that Eurasian groups were more similar to populations from southern India, than they were to those in Africa.
This supports a southern route of migration from Africa via the Bab-el-Mandeb Strait in Arabia before any movement heading north, and suggests a special role for south Asia in the “out of Africa” expansion of modern humans.
The new analytical method looks at recombinations of DNA chromosomes over time, which is one determinant of how new gene sequences are created in subsequent generations.
Imagine a recombining chromosome as a deck of cards. When a pair of chromosomes is shuffled together, it creates combinations of DNA. This recombination process occurs through the generations.
Recombination contributes to genome diversity in 99% of the human genome. However, many believed it was impossible to map the recombinational history of DNA due to the complex, overlapping patterns created in every generation.
Now, by applying detailed computational methods and powerful algorithms, scientists can provide new evidence on the size and history of ancient populations.
Ajay Royyuru, senior manager at IBM’s Computational Biology Center, had this to say about the effort: “Over the past six years, we’ve had the opportunity to gather and analyze genetic data around the world at a scale and level of detail that has never been done before.
“When we started, our goal was to bring science expeditions into the modern era to further a deeper understanding of human roots and diversity. With evidence that the genetic diversity in southern India is closer to Africa than that of Europe, this suggests that other fields of research such as archaeology and anthropology should look for additional evidence on the migration route of early humans to further explore this theory.”
Filling In The Genographic Gaps
The Genographic Project continues to fill in the gaps of our knowledge of the history of humankind and unlock information from our genetic roots that not only impacts our personal stories, but can reveal new dimensions of civilizations, cultures and societies over the past tens of thousands of years.
“The application of new analytical methods, such as this study of recombinational diversity, highlights the strength of the Genographic Project’s approach. Having assembled a tremendous resource in the form of our global sample collection and standardized database, we can begin to apply new methods of genetic analysis to provide greater insights into the migratory history of our species,” said Genographic Project Director Spencer Wells.
The recombination study highlights the initial six-year effort by the Genographic Project to create the most comprehensive survey of human genetic variation using DNA contributed by indigenous peoples and members of the general public, in order to map how the Earth was populated.
Nearly 500,000 individuals have participated in the Project with field research conducted by 11 regional centers to advance the science and understanding of migratory genealogy. This database is one of the largest collections of human population genetic information ever assembled and serves as an unprecedented resource for geneticists, historians and anthropologists.
At the core of the project is a global consortium of 11 regional scientific teams following an ethical and scientific framework and who are responsible for sample collection and analysis in their respective regions.
The Project is open to members of the public to participate through purchasing a public participation kit from the Genographic Web site (www.nationalgeographic.com/genographic), where they can also choose to donate their genetic results to the expanding database. Sales of the kits help fund research and support a Legacy Fund for indigenous and traditional peoples’ community-led language revitalization and cultural projects.
Information on Demand 2011: Big Data, Bigger Insights
Greetings from Viva Las Vegas, Nevada.
The CNN Republican debate is long over, the media circus is over, and the information gatherers for IBM Information on Demand 2011 are arriving en masse.
My Webcasting partner-in-crime, Scott Laningham, and I arrived here yesterday mostly without incident. We scoped out the situation, and decided that the Mandalay Bay Race and Sports Book was the perfect venue to sit down, have a burger, and watch the third game of the World Series.
Since baseball and data are going to be an underlying theme in Michael Lewis and Billy Beanes’ keynote about Moneyball later this week, it only seemed appropriate.
And though my Texas Rangers ended up taking a beating, we did witness some new data added to the baseball history books: The Cardinals’ Albert Pujols tied Babe Ruth and Reggie Jackson for the most home runs struck in one game of a World Series, the magic number three (to be precise, the Babe did it twice).
And though you may never be able to fully predict the specific outcome of a single baseball game, Billy Beane and his Oakland A’s team proved that you can use past player performance statistics to help build a better team, one that could compete with the “big money” teams.
Okay, so if past prediction can help prove future performance, where does that leave we Information On Demanders for this 2011 event?
Let’s start with the business benefit, which in these tough times are necessary for even the most profitable of enterprises.
IBM studies have demonstrated that the performance gap between those leaders and the laggards and followers is widening: Organizations that apply advanced analytics have 33% more revenue growth and 12X more profit growth.
That ought to get some executive attention.
But we’re also seeing some major shifts in the external environment. Information is exploding. We’ve now got over 1 trillion devices connected to the Internet, and we’re expecting 44X digital data growth through 2020.
And yet we’re also finding that business change is outpacing our ability to keep up with it all: 60% of CEOs agree they have more data than they can use effectively, and yet 4 out of 5 business leaders see information as a vital source of competitive advantage.
So what’s the remedy? Well, those flying in to Vegas have taken the first step, admitting they have a problem (No, not “The Hangover” type problems — you’ll have to talk to Mike Tyson about those).
No, successful organizations are turning all that data into actionable insight by taking a more structured approach through business analytics and optimization (BAO).
They’re embracing it as a transformational imperative, and demonstrating that they can improve visibility throughout the enterprise, enhance their understanding of their customers, and fostering collaborative decision-making while providing those key predictive insights and optimizing real-time decision making.
So, like a good baseball player, or manager, your job over the next several days here in Vegas is to do a few key things, and do them well.
Focus, keep your eye on the ball and on the topics most important and relevant to you.
Listen, including both in the general sessions and individual tracks, but also in those all important hallway conversations — you never know what you might learn.
Participate, particularly in the social media. We IBMers and our key partners want to hear from you, and we’re only a Tweet away. Use conference hashtag #iodgc2011 to speak up, as we’re listening in return.
Commit, to the actions coming out of the event that you think will be helpful to you and your organization, and to bring those business and technology goals into becoming a reality.
And one other thing…have fun! Whatever happens in Vegas may not stay in Vegas…it may even end up on Facebook…but that shouldn’t stop you from having a good time and learning a lot this week.
As for Scott Laningham and myself, we’ll be blogging and covering key sessions, and “livestreaming” from the Expo floor. Stop by and say hello.
Digital Diet
We’re less than 24 hours away from financial Armageddon. I’ve been stocking up on water and non-perishables in my garage, just in case.
No no, no tin foil helmet radio for me. Justttt kiddingg.
I’m confident our politicians are going to reach some fiscal sanity, and my understanding of the process is that the Senate is about to vote on the bill passed last evening by the House, so my fingers are crossed.
But let it be known that the U.S. Congress isn’t the only legislative body that’s busy attending to the peoples’ business.
In Japan, IBM announced yesterday that it’s helping the National Diet Library of Japan, the country’s only national library, to digitize its literary artifacts on a massive scale to make them widely available and searchable online (The Diet is the legislative body in Japan).
The prototype technology enabling the system was built by IBM Research and allows full-text digitization of Japanese literature to be quickly realized through expansive recognition of Japanese characters and enabling users to collaboratively review and correct language characters, script and structure.
The system is also designed to promote future international collaborations and standardization of libraries around the world.
“Nearly two decades ago in his book Digital Library, Dr. Makoto Nagao, the director of the National Diet Library, shared his vision that digitized and structured electronic books will dramatically change the role of libraries and the way knowledge will be shared and reused in our society,” said Dr. Hironobu Takagi, who led the development of the prototype technology at IBM Research – Tokyo.
“Until now, the breadth of the characters and expressions within the Japanese language had posed a series of challenges to massive digitization. In order to enable this transfer of knowledge from print to online, we realized the need for both machine and human intelligence to understand information in every form.”
Compared to other languages, which rely on just a few dozen alphabetical characters, Japanese is extremely diverse in terms of script. In addition to syllabary characters, hiragana and katakana, Japanese includes about 10,000 kanji characters (including old characters, variants and 2,136 commonly used characters), in addition to ruby (a small Japanese syllabary character reading aid printed right next to a kanji) and mixed vertical and horizontal texts.
Aside from ensuring quality recognition of Japanese characters, IBM researchers aimed to optimize the amount of time needed to review and verify the accuracy of the digitized texts. By introducing unique collaborative tools via crowdsourcing, the technology allows many users to quickly pour through the texts and make corrections at a much higher rate of productivity and efficiency.