Archive for the ‘big data’ Category
Big Moves In Big Data: IBM New Data Acceleration, Hadoop Capabilities

Click to enlarge. IBM just announced new technologies designed to help companies and governments tackle Big Data by making it simpler, faster and more economical to analyze massive amounts of data. New data acceleration innovation results in as much as 25 times faster reporting and analytics.
IBM made a significant announcement earlier today concerning new technologies designed to help companies and governments tackle Big Data by making it simpler, faster and more economical to analyze massive amounts of data. The new data acceleration innovation results in as much as 25 times faster reporting and analytics.
Today’s announcement, which represents the work of hundreds of IBM developers and researchers in labs around the world, includes an industry-first innovation called “BLU Acceleration,” which combines a number of techniques to dramatically improve analytical performance and simplify administration.
Also announced was the new IBM PureData System for Hadoop, designed to make it easier and faster to deploy Hadoop in the enterprise. Hadoop is the game-changing open-source software used to organize and analyze vast amounts of structured and unstructured data, such as posts to social media sites, digital pictures and videos, online transaction records, and cell phone location data.
The new system can reduce from weeks to minutes the ramp-up time organizations need to adopt enterprise-class Hadoop technology with powerful, easy-to-use analytic tools and visualization for both business analysts and data scientists.
In addition, it provides enhanced Big Data tools for monitoring, development and integration with many more enterprise systems.
IBM Big Data Innovations: More Accessible, Enterprise-ready
As organizations grapple with a flood of structured and unstructured data generated by computers, mobile devices, sensors and social networks, they’re under unprecedented pressure to analyze much more data at faster speeds and at lower costs to help deepen customer relationships, prevent threat and fraud, and identify new revenue opportunities.
BLU Acceleration enables users to have much faster access to key information, leading to better decision-making. The software extends the capabilities of traditional in-memory systems — which allows data to be loaded into Random Access Memory instead of hard disks for faster performance — by providing in-memory performance even when data sets exceed the size of the memory.
During testing, some queries in a typical analytics workload were more than 1000 times faster when using the combined innovations of BLU Acceleration.
Innovations in BLU Acceleration include “data skipping,” which allows the ability to skip over data that doesn’t need to be analyzed, such as duplicate information; the ability to analyze data in parallel across different processors; and greater ability to analyze data transparently to the application, without the need to develop a separate layer of data modeling.
Another industry-first advance in BLU Acceleration is called “actionable compression,” where data no longer has to be decompressed to be analyzed.
Not IBM’s First Big Data Rodeo
The new offerings expand what is already the industry’s deepest portfolio of Big Data technologies and solutions, spanning software, services, research and hardware. The IBM Big Data platform combines traditional data warehouse technologies with new Big Data techniques, such as Hadoop, stream computing, data exploration, analytics and enterprise integration, to create an integrated solution to address these critical needs.
IBM PureData System for Hadoop is the next step forward in IBM’s overall strategy to deliver a family of systems with built-in expertise that leverages its decades of experience reducing the cost and complexity associated with information technology.
This new system integrates IBM InfoSphere BigInsights, which allows companies of all sizes to cost-effectively manage and analyze data and add administrative, workflow, provisioning and security features, along with best-in-class analytical capabilities from IBM Research.
Today’s announcement also includes the following new versions of IBMs Big Data solutions:
- A new version of InfoSphere BigInsights, IBM’s enterprise-ready Hadoop offering, which makes it simpler to develop applications using existing SQL skills, compliance security and high availability features vital for enterprise applications. BigInsights offers three entry points: free download, enterprise software and now an expert integrated system, IBM PureData System for Hadoop.
- A new version of InfoSphere Streams, unique “stream computing” software that enables massive amounts of data in motion to be analyzed in real-time, with performance improvements, and simplified application development and deployment.
- A new version of Informix including TimeSeries Acceleration for operational reporting and analytics on smart meter and sensor data.
Pricing and Availability
All offerings are available in Q2, except the PureData System for Hadoop, which will start shipping to customers in the second half 2013. Credit-qualified clients can take advantage of simple, flexible lease and loan packages with no up-front payments for the software and systems that deliver a new generation of data analytics.
IBM Global Financing offers attractive leasing programs with 90-day payment deferrals for the PureData System for Hadoop, as well as zero percent loans for the broader portfolio of IBM big data solutions.
Big Data, Big Security, Big Boxes
There’s been some substantial “Big Data” announcements over the past week from Big Blue.
Late last week, on the heels of the public disclosure of security breaches at a number of major media organizations, including The New York Times, The Wall Street Journal, and the Washington Post, IBM announced its new “IBM Security Intelligence With Big Data” offering, which combines leading security intellignece with big data analytics capabilities for both external cyber security threats and internal risk detection and protection.
You can learn more about that offering here.
IBM is also working to make it easier for organizations to quickly adopt and deploy big data and cloud computing solutions.
Today, the company announced major advances to its PureSystems family of expert integrated systems.
Now, organizations challenged by limited IT skills and resources can quickly comb through massive volumes of data and uncover critical trends that can dramatically impact their business.
The new PureSystems models also help to remove the complexity of developing cloud-based services by making it easier to provision, deploy and manage a secure cloud environment.
Together, these moves by IBM further extend its leadership in big data and next generation computing environments such as cloud computing, while opening up new opportunities within growth markets and with organizations such as managed service providers (MSPs).
Big Data Only Getting Bigger
Across all industries and geographies, organizations of various sizes are being challenged to find simpler and faster ways to analyze massive amounts of data and better meet client needs.
According to IDC, the market for big data technology and services will reach $16.9 billion by 2015, up from $3.2 billion in 2010.1
At the same time, an IBM study found that almost three-fourths of leaders surveyed indicated their companies had piloted, adopted or substantially implemented cloud in their organizations — and 90 percent expect to have done so in three years. While the demand is high, many organizations do not have the resources or skills to embrace it.
Today’s news includes PureData System for Analytics to capitalize on big data opportunities; a smaller PureApplication System to accelerate cloud deployments for a broader range of organizations; PureApplication System on POWER7+ to ease management of transaction and analytics applications in the cloud; additional options for MSPs across the PureSystems family including flexible financing options and specific MSP Editions to support new services models; and SmartCloud Desktop Infrastructure to ease management of virtual desktop solutions.
New Systems Tuned for Big Data
The new IBM PureData System for Analytics, powered by Netezza technology, features 50 percent greater data capacity per rack3 and is able to crunch data 3x faster4, making this system a top performer, while also addressing the challenges of big data.
The IBM PureData System for Analytics is designed to assist organizations with managing more data while maintaining efficiency in the data center – a major concern for clients of all sizes.
With IBM PureData System for Analytics, physicians can analyze patient information faster and retailers can better gain insight into customer behavior. The New York Stock Exchange (NYSE) relies on PureData System for Analytics to handle an enormous volume of data in its trading systems and identify and investigate trading anomalies faster and easier.
You can learn more about these and other new PureSystems capabilities here.
To aid in the detection of stealthy threats that can hide in the increasing mounds of data, IBM recently announced IBM Security Intelligence with Big Data, combining leading security intelligence with big data analytics capabilities for both external cyber security threats and internal risk detection and prevention. IBM Security Intelligence with Big Data provides a comprehensive approach that allows security analysts to extend their analysis well beyond typical security data and to hunt for malicious cyber activity.
Watson Heads Back To School
Well, the introduction of the BlackBerry 10 OS has come and gone, Research In Motion renamed itself as “BlackBerry,” the new company announced two new products, and the market mostly yawned.
Then again, many in the market seemed to find something to love about either the new interface and/or the new devices. David Pogue, the New York Time’s technology columnist (who typically leans towards being a Machead), wrote a surprisingly favorable review . Then again today, he opined again in a post entitled “More Things To Love About The BlackBerry 10.”
With that kind of ink, don’t vote the tribe from Ottawa off of the island just yet!
As I pondered the fate of the BlackBerry milieu, it struck me I hadn’t spilled any ink lately myself about IBM’s Watson, who’s been studying up on several industries since beating the best humans in the world two years ago at “Jeopardy!”
Turns out, Watson’s also been looking to apply to college, most notably, Rensselaer Polytechnic Institute. Yesterday, IBM announced it would be providing a modified version of an IBM Watson system to RPI, making it the first university to receive such a system.
The arrival of Watson will enable RPI students and faculty an opportunity to find new users for Watson and deepen the systems’ cognitive computing capabilities. The firsthand experience of working on the system will also better position RPI students as future leaders in the Big Data, analytics, and cognitive computing realms.
Watson has a unique ability to understand the subtle nuances of human language, sift through vast amounts of data, and provide evidence-based answers to its human users’ questions.
Currently, Watson’s fact-finding prowess is being applied to crucial fields, such as healthcare, where IBM is collaborating with medical providers, hospitals and physicians to help doctors analyze a patient’s history, symptoms and the latest news and medical literature to help physicians make faster, more accurate diagnoses. IBM is also working with financial institutions to help improve and simplify the banking experience.
Rensselaer faculty and students will seek to further sharpen Watson’s reasoning and cognitive abilities, while broadening the volume, types, and sources of data Watson can draw upon to answer questions. Additionally, Rensselaer researchers will look for ways to harness the power of Watson for driving new innovations in finance, information technology, business analytics, and other areas.
With 15 terabytes of hard disk storage, the Watson system at Rensselaer will store roughly the same amount of information as its Jeopardy! predecessor and will allow 20 users to access the system at once — creating an innovation hub for the institutes’ New York campus. Along with faculty researchers and graduate students, undergraduate students at Rensselaer will have opportunities to work directly with the Watson system.This experience will help prepare Rensselaer students for future high-impact, high-value careers in analytics, cognitive computing, and related fields.
Underscoring the value of the partnership between IBM and Rensselaer, Gartner, Inc. estimates that 1.9 million Big Data jobs will be created in the U.S. by 2015.
This workforce — which is in high demand today — will require professionals who understand how to develop and harness data-crunching technologies such as Watson, and put them to use for solving the most pressing of business and societal needs.
As part of a Shared University Research (SUR) Award granted by IBM Research, IBM will provide Rensselaer with Watson hardware, software and training.The ability to use Watson to answer complex questions posed in natural language with speed, accuracy and confidence has enormous potential to help improve decision making across a variety of industries from health care, to retail, telecommunications and financial services.
IBM and Rensselaer: A History of Collaboration
Originally developed at the company’s Yorktown Heights, N.Y. research facility, IBM’s Watson has deep connections to the Rensselaer community. Several key members of IBM’s Watson project team are graduates of Rensselaer, the oldest technological university in the United States.
Leading up to Watson’s victory on Jeopardy!, Rensselaer was one of eight universities that worked with IBM in 2011 on the development of open architecture that enabled researchers to collaborate on the underlying QA capabilities that help to power Watson.
Watson is the latest collaboration between IBM and Rensselaer, which have worked together for decades to advance the frontiers of high-performance computing, nanoelectronics, advanced materials, artificial intelligence, and other areas. IBM is a key partner of the Rensselaer supercomputing center, the Computational Center for Nanotechnology Innovations, where the Watson hardware will be located.
Flanked by the avatar of IBM’s Watson computer, IBM Research Scientist Dr. Chris Welty (left) and Rensselaer Polytechnic Institute student Naveen Sundar discuss potential new ways the famous computer could be used, Wednesday, January 30, 2013 in Troy, NY. IBM donated a version of its Watson system to Rensselaer, making it the first university in the world to receive such a system. Rensselaer students and faculty will explore new uses for Watson and ways to deepen its cognitive computing capabilities. (Philip Kamrass/Feature Photo Service for IBM)
IBM To Acquired StoredIQ
IBM today announced it has entered into a definitive agreement to acquire StoredIQ Inc., a privately held company based in Austin, Texas.
Financial terms of the deal were not disclosed.
StoredIQ will advance IBM’s efforts to help clients derive value from big data and respond more efficiently to litigation and regulations, dispose of information that has outlived its purpose and lower data storage costs.
With this agreement, IBM adds to its prior investments in Information Lifecycle Governance. The addition of StoredIQ capabilities enables clients to find and use unstructured information of value, respond more efficiently to litigation and regulatory events and lower information costs as data ages.
IBM’s Information Lifecycle Governance suite improves information economics by helping companies lower the total cost of managing data while increasing the value derived from it by:
- Eliminating unnecessary cost and risk with defensible disposal of unneeded data
- Enabling businesses to realize the full value of information as it ages
- Aligning cost to the value of information
- Reducing information risk by automating privacy, e-discovery, and regulatory policies
Adding StoredIQ to IBM’s Information Lifecycle Governance suite gives organizations more effective governance of the vast majority of data, including efficient electronic discovery and its timely disposal, to eliminate unnecessary data that consumes infrastructure and elevates risk.
As a result, business leaders can access and analyze big data to gain insights for better decision-making. Legal teams can mitigate risk by meeting e-discovery obligations more effectively. Also, IT departments can dispose of unnecessary data and align information cost to value to take out excess costs.
What Does StoredIQ Software Do?
StoredIQ software provides scalable analysis and governance of disparate and distributed email as well as file shares and collaboration sites. This includes the ability to discover, analyze, monitor, retain, collect, de-duplicate and dispose of data.
In addition, StoredIQ can rapidly analyze high volumes of unstructured data and automatically dispose of files and emails in compliance with regulatory requirements.

StoredIQ brings powerful, innovative capabilities to govern data in place to drive value up and cost out.
“CIOs and general counsels are overwhelmed by volumes of information that exceed their budgets and their capacity to meet legal requirements,” said Deidre Paknad, vice president of Information Lifecycle Governance at IBM. “With this acquisition, IBM adds to its unique strengths as a provider able to help CIOs and attorneys rapidly drive out excess information cost and mitigate legal risks while improving information utility for the business.”
Named a 2012 Cool Vendor by Gartner, StoredIQ has more than 120 customers worldwide, including global leaders in financial services, healthcare, government, manufacturing and other sectors. Other systems require months to index data and years to configure, install and address information governance. StoredIQ can be up and running in just hours, immediately helping clients drive out cost and risk.
IBM intends to incorporate StoredIQ into its Software Group and its Information Lifecycle Governance business.
Building on prior acquisitions of PSS Systems in 2010 and Vivisimo in 2012, IBM adds to its strength in rapid discovery, effective governance and timely disposal of data. The acquisition of StoredIQ is subject to customary closing conditions and is expected to close in the first quarter of 2013.
Go here for more information on IBM’s Information Lifecycle Governance suite, and here for more information on IBM’s big data platform.
The Vindication Of Nate Silver
I was all set to write a closer examination of statistician and blogger Nate Silver’s most recent election predictions, a ramp up to during which he was lambasted by a garden variety of mostly conservative voices for either being politically biased, or establishing his predictions on a loose set of statistical shingles.
Only to be informed that one of my esteemed colleagues, David Pittman, had already written such a compendium post. So hey, why reinvent the Big Data prediction wheel?
Here’s a link to David’s fine post, which I encourage you to check out if you want to get a sense of how electoral predictions provide an excellent object lesson for the state of Big Data analysis. (David’s post also includes the on-camera interview that Scott Laningham and I conducted with Nate Silver just prior to his excellent keynote before the gathered IBM Information On Demand 2012 crowd.)
I’m also incorporating a handful of other stories I have run across that I think do a good job of helping people better understand the inflection point for data-driven forecasting that Silver’s recent endeavor represents, along with its broader impact in media and punditry.
They are as follows:
“Nate Silver’s Big Data Lessons for the Enterprise”
“What Nate Silver’s success says about the 4th and 5th estates”
“Election 2012: Has Nate Silver destroyed punditry?”
Nate Silver After the Election: The Verdict
As Forbes reporter wrote in his own post about Silver’s predictions, “the modelers are here to stay.”
Moving forward, I expect we’ll inevitably see an increased capability for organizations everywhere to adopt Silver’s methodical, Bayesian analytical strategies…and well beyond the political realm.
Live @ Information On Demand 2012: Watson’s Next Job
As I mentioned in my last post, yesterday was day 3 of Information On Demand 2012 here in Las Vegas.
There was LOTS going on out here in the West.
We started the day by interviewing keynote speaker Nate Silver (see previous post) just prior to his going on stage for the morning general session. Really fascinating interview, and going in to it I learned that his book had reached #8 on The New York Times best seller list.

In the IOD 2012 day 3 general session, IBM Fellow Rob High explains how IBM’s Watson technology may soon help drive down call center costs by 50%, using the intelligence engine of Watson to help customer service reps faster respond to customer queries.
So congrats, Nate, and thanks again for a scintillating interview.
During the morning session, we also heard from IBM’s own Craig Rinehart about the opportunity for achieving better efficiencies in health care using enterprise content management solutions from IBM.
I nearly choked when Craig explained that thirty cents out of every dollar on healthcare in the U.S. is wasted, and despite spending more than any other country, is ranked 37th in terms of care.
Craig explained the IBM Patient Care and Insights tool was intended to bring advanced analytics out of the lab and into the hospital to help start driving down some of those costs, and more importantly, to help save lives.
We also heard from IBM Fellow and CTO of IBM Watson Solutions’ organization, Rob High, about some of the recent advancements made on the Watson front.
High explained the distinction between programmatic and cognitive computing, the latter being the direction computing is now taking, and an approach that provides for much more “discoverability” even as it’s more probabilistic in nature.
High walked through a fascinating call center demonstration, whereby Watson helped a call center agent more quickly respond to a customer query by filtering through thousands of possible answers in a few second, then honed in on the ones most likely that would answer the customer’s question.
Next, we heard from Jeff Jonas, IBM’s entity analytics “Ironman” (Jeff also just competed his 27th Ironman triathlon last weekend), who explained his latest technology, context accumulation.
Jeff observed that context accumulation was the “incremental process of integrating new observations with previous ones.”
Or, in other words, developing a better understanding of something by taking more into account the things around it.
Too often, Jeff suggested, analytics has been done in isolation, but that “the future of Big Data is the diverse integration of data” where “data finds data.”
His new method allows for self-correction, and a high tolerance for disagreement, confusion and uncertainty, and where new observations can “reverse earlier assertions.”
For now, he’s calling the technology “G2,” and explains it as a “general purpose context accumulating engine.”
Of course, there was also the Nate Silver keynote, the capstone of yesterday’s morning session, to which I’ll refer you back to the interview Scott and I conducted to get a summary taste of all the ideas Nate discussed. Your best bet is to buy his book, if you really want to understand where he thinks we need to take the promise of prediction.
(Almost) Live @ Information On Demand 2012: A Q&A With IBM’s Jeff Jonas

Jeff Jonas sat down last evening with Scott and I in the Information On Demand 2012 Solutions EXPO to chat about privacy in the Big Data age, and also gave a sneak look into the new “Context Accumulation” technology he’s been working on.
You really ought to get to know IBM’s Jeff Jonas.
As chief scientist of the IBM Entity Analytics group and an IBM Distinguished Engineer, Jeff has been instrumental in driving the development of some ground-breaking technologies, during and prior to IBM’s acquisition of his company, Systems Research & Development (SRD), which Jonas founded in 1984.
SRD’s technology included technology used by the surveillance intelligence arm of the gaming industry, and leveraged facial recognition to protect casinos from aggressive card counting teams (never mind the great irony that IBM’s Yuchun Lee was once upon a time one of those card counters — I think we need to have an onstage interview between those two someday, and I volunteer to conduct it!)
Today, possibly half the casinos in the world use technology created by Jonas and his SRD team, work frequently featured on the Discovery Channel, Learning Channel, and the Travel Channel.
Following an investment in 2001 by In-Q-Tel, the venture capital arm of the CIA, SRD also played a role in America’s national security and counterterrorism mission. One such contribution includes a unique analysis of the connections between the 9/11 terrorists.
This “link analysis” is so unique that it is taught in universities and has been the widely cited by think tanks and the media, including an extensive one-on-one interview with Peter Jennings for ABC PrimeTime.
Following IBM’s acquisition of SRD, these Jonas-inspired innovations continue to create big impacts on society, including the arrest of over 150 child pornographers and the prevention of a national security risk poised against a significant American sporting event.
This technology also assisted in the reunification of over 100 loved ones separated by Hurricane Katrina and at the same time was used to prevent known sexual offenders from being co-located with children in emergency relocation facilities.
Jonas is also somewhat unique as a technologist in that he frequently engages with those in the privacy and civil liberties community. The essential question: How can government protect its citizens while preventing the erosion of long-held freedoms like the Fourth Amendment? With privacy in mind, Jonas invented software which enables organizations to discover records of common interest (e.g., identities) without the transfer of any privacy-invading content.
That’s about where we start this interview with Jeff Jonas, so I’ll let Scott and myself take it from there…