Pre-built Analytic Modules Will Drive AI Revolution in Industry

By Bill Schmarzo, CTO, Big Data Practice of EMC Global Services What is the Intelligence Revolution equivalent to the 1/4” bolt? I asked this question in the blog “How History Can Prepare Us for Upcoming AI Revolution?” when trying to understand what history can teach us about technology-induced revolutions.  One of the key capabilities of the […]

By Bill Schmarzo, CTO, Big Data Practice of EMC Global Services

What is the Intelligence Revolution equivalent to the 1/4” bolt?

I asked this question in the blog “How History Can Prepare Us for Upcoming AI Revolution?” when trying to understand what history can teach us about technology-induced revolutions.  One of the key capabilities of the Industrial and Information revolutions was the transition from labor-intensive, hand-crafted to mass manufactured solutions.  In the Information Revolution, it was the creation of standardized database management systems, middleware and operating systems.  For the Industrial Revolution, it was the creation of standardized parts – like the ¼” bolt – that could be used to assemble versus hand-craft solutions. So, what is the ¼” bolt equivalent for the AI Revolution?  I think the answer is Analytic engines or modules!

Analytic Modules are pre-built engines – think Lego blocks – that can be assembled to create specific business and operational applications.  These Analytics Modules would have the following characteristics:

  • pre-defined data input definitions and data dictionary (so it knows what type of data it is ingesting, regardless of the origin of the source system).
  • pre-defined data integration and transformation algorithms to cleanse, align and normalize the data.
  • pre-defined data enrichment algorithms to create higher-order metrics (e.g., reach, frequency, recency, indices, scores) necessitated by the analytic model.
  • algorithmic models (built using advanced analytics such as predictive analytics, machine learning or deep learning) that takes the transformed and enriched data, runs the algorithmic model and generates the desired outputs.
  • layer of abstraction (maybe using Predictive Model Markup Language or PMML[1]) above the Predictive Analytics, Machine Learning and Deep Learning frameworks that allows application developers to pick/use their preferred or company mandated standards.
  • orchestration capability to “call” the most appropriate machine learning or deep learning framework based upon the type of problem being addressed. See Keras, which is a high-level neural networks API, written in Python and capable of running on top of popular machine learning frameworks such as TensorFlow, CNTK, or Theano.
  • pre-defined outputs (API’s) that feeds the analytic results to the downstream operational systems (e.g., operational dashboards, manufacturing, procurement, marketing, sales, support, services, finance).
  • Analytic Modules produce pre-defined analytic results or outcomes, while providing a layer of abstract that enables the orchestration and optimization of the underlying machine learning and deep learning frameworks.
    Monetizing IOT with Analytic Modules

    The BCG Insights report titled “Winning in IoT: It’s All About the Business Processes” highlighted the top 10 IoT use cases that will drive IoT spending including predictive maintenance, self-optimized production, automated inventory management, fleet management and distributed generation and storage (see Figure 1).

    But these IoT applications will be more than just reports and dashboards that monitor what is happening. They’ll be “intelligent” – learning with every interaction to predict what’s likely to happen and prescribe corrective action to prevent costly, undesirable and/or dangerous situations – and the foundation for an organization’s self-monitoring, self-diagnosing, self-correcting and self-learningtwoIoT environment.

    While this is a very attractive list of IoT applications to target, treating any of these use cases as a single application is a huge mistake.  It’s like the return of the big bang IT projects of ERP, MRP and CRM days, where tens of millions of dollars are spent in hopes that two to three years later, something of value materializes.

    Instead, these IoT “intelligent” applications will be comprised of analytic modules integrated to address the key business and operational decisions that these IoT intelligent applications need to address.  For example, think of Predictive maintenance as comprised of an assembly of analytic modules addressing the following predictive maintenance decisions including:

    • identifying At-risk component failure prediction.
    • optimizing resource scheduling and staffing.
    • matching Technician and Inventory to the maintenance and repair work to be done.
    • ensuring tools and repair equipment availability.
    • ensuring First-time-fix optimization.
    • optimizing Parts and MRO inventory.
    • predicting Component fixability.
    • optimizing the Logistics of parts, tools and technicians.
    • leveraging Cohorts analysis to improve service and repair predictability.
    • leveraging Event association analysis to determine how weather, economic and special events impact device and machine maintenance and repair needs.

    As I covered in the blog “The Future Is Intelligent Apps,” the only way to create intelligent applications is to have a methodical approach that starts the predictive maintenance hypothesis development process with the identification, validation, valuing and prioritizing of the decisions (or use cases) that comprise these intelligent applications.

    Read the source article in Data Science Central.

Data Science on a Budget: Audubon’s Advanced Analytics

On Memorial Day weekend 2038, when your grandchildren visit the California coast, will they be able to spot a black bird with a long orange beak called the Black Oystercatcher? Or will that bird be long gone? Will your grandchildren only be able to see that bird in a picture in a book or on […]

On Memorial Day weekend 2038, when your grandchildren visit the California coast, will they be able to spot a black bird with a long orange beak called the Black Oystercatcher? Or will that bird be long gone? Will your grandchildren only be able to see that bird in a picture in a book or on a website?

A couple of data scientists at the National Audubon Society have been examining the question of how climate change will impact where birds live in the future, and the Black Oystercatcher has been identified as a “priority” bird — one whose range is likely to be impacted by climate change.

How did Audubon determine this? It’s a classic data science problem.

First, consider birdwatching itself, which is pretty much good old-fashioned data collection. Hobbyists go out into the field, identify birds by species and gender and sometimes age, and record their observations on their bird lists or bird books, and more recently on their smartphone apps.

Audubon itself has sponsored an annual crowdsourced data collection event for more than a century — the Audubon Christmas Bird Count — providing the organization with an enormous dataset of bird species and their populations in geographies across the country at specific points in time. The event is 118 years old and one of the longest data sets for birds in the world

That’s one of the data sets that Audubon used in its project that looks at the impact of climate change on bird species’ geographical ranges, according to Chad Wilsey, director of conservation science at Audubon. He spoke with InformationWeek in an interview. Wilsey is an ecologist, and not trained as a data scientist. But like many scientists, he uses data science as part of his work. In this case, as part of a team of two ecologists, he applied statistical modeling using technologies such as R to multiple data sets to create the predictive models for future geographical ranges for specific bird species. The results are published in the 2014 report, Audubon’s Birds and Climate Change. Audubon also published interactive ArcGIS maps of species and ranges to its website.

The initial report used Audubon’s Christmas bird count data set and the North American Breeding Bird Survey from the US government. The report assessed geographic range shifts through the end of the century for 588 North American bird species during both the summer and winter seasons under a range of future climate change scenarios. Wilsey’s team built models based on climatic variables such as historical monthly temperature and precipitation averages and totals. The team built models using boosted regression trees and machine learning. These models were built with bird observations and climate data from 2000 to 2009 and then evaluated with data from 1980 to 1999.

“We write all our own scripts,” Wilsey told me. “We work in R. It is all machine learning algorithms to build these statistical models. We were using very traditional data science models.”

Audubon did all this work on an on-premises server with 16-CPUs and 128 gigabytes of RAM.

Read the source article in InformationWeek.com.

Thought Leadership: Nathaniel Gates, CEO, Alegion

Companies With Large AI Projects are Calling Alegion to Help Prepare Needed Data via Crowdsourcing Nathaniel Gates is CEO of Alegion, which he co-founded in 2012. Prior  to Alegion, he founded Cloud49, a cloud computing solutions provider focused on the public sector. He lived and worked in Alaska for 36 years before moving to Austin, Texas […]

Companies With Large AI Projects are Calling Alegion to Help Prepare Needed Data via Crowdsourcing

Nathaniel Gates is CEO of Alegion, which he co-founded in 2012. Prior  to Alegion, he founded Cloud49, a cloud computing solutions provider focused on the public sector. He lived and worked in Alaska for 36 years before moving to Austin, Texas in 2012. He recently spoke with AI Trends Editor John P. Desmond.

Q. What is the mission of Alegion?

A. The mission of Alegion is to get work done and get work distributed out to a workforce across the world and across this country. And when we do that by providing a service, and that service is to bring exceptional human and machine intelligence to bear against very large business challenges, largely around transitioning to an AI data-driven environment. Our goal is that we can hold the hand of customers and take them over the bridge to AI. We meet them on the near side of it, which is human judgment and human intelligence, and we cross the bridge with them to the point where we can hand off to a machine intelligence, where they can do the work with confidence and quality. That’s really the goal of our engagements with our customers.

Q. Can you talk about your crowdsourcing approach?

A. Sure, When Alegion first got started about six years ago now, we were helping organizations leverage public crowds to get work done, for example by using Mechanical Turk, an Amazon crowd. That’s a workforce made up of hundreds of thousands of people all over the world that have their unique skills and abilities that can be brought to bear against different business challenges. Business might need to categorize receipts or it might need to tag photos or it might need to be able to moderate user content when it gets uploaded to a website. Well, that work can be spread out and split out amongst these work forces to get the work done at incredible scale and with elasticity. And what I mean by those two words is the scale, it doesn’t scare us to address millions of records simultaneously, and we can do that by scaling up the workforce immediately to be able to tackle a challenge. When the work starts to fade away, the workforce gets assigned to other work. We call that elasticity where we can go up and down based off of the demand.  

And so that crowdsourcing was Alegion’s business model and has been Alegion’s business model for the last several years up until about two years ago. Then we started looking at the preponderance of our customers who are using our crowd to develop training data for their AI projects. And what I mean by training data is the models which are used by organizations for the purposes of artificial intelligence, oftentimes require a supervised data set. That is a data set that has been validated and is accurate so that the model can make the right decision is in the right circumstances. So just like we train our toddlers and we tell them not to go next to the hot stove or we tell them to clean their room, we train them and we set the boundaries, and they can lean on that training and that past performance when it comes time to make future judgments. And how that relates to AI is if you’re doing something like developing a self-driving car and you want the self-driving car to know that what it sees in front of it is a child chasing a ball, that has to be a trained response. They have to have seen many instances in which a ball rolls in front of that car and somebody has labeled that as a hazard. Thus the machine is trained to know that it needs to make a judgment decision. So these organizations have been using crowdsourcing now for some time and with increasing pace to be able to create very large training sets for addressing these AI initiatives within their companies.

Q. You mentioned your business model changed about two years ago?

A. It really has. We’ve gone from being a services-based company using the crowd to work on projects, to being focused on only providing services relative to our artificial intelligence efforts. That allows us to develop our own software and our own processes to ensure that those training data sets are developed at very high quality and very specific to the customer company and its industry. So we are more focused and we have developed software and a platform to help us do the work with repeatable high quality. So that was a pretty big change for our organization. Fundamentally, it didn’t change the business model. Fundamentally, we are still distributing out large amounts of work to workforces. And we at Alegion are ensuring that we give high quality results back to the customer.

Q. How do you sell your services and what industries do you target?

A. Our business model is software as a service. Organizations subscribe to our platform, and our platform allows them to create training data. It could be data that they don’t have. They may want to go collect data or find data relative to what it is they’re trying to do. If the company is building a recommendation engine for what videos customers should watch, they might want to find 100,000 types of movies and then classify those. And so from that standpoint, we’d be collecting data and then classifying that data so that it could be ingested by their model.

Most companies already have their own data, but the data is not suitable for training because it has not been properly classified. So a large company like Coca-Cola might have huge amounts of records for bottling companies and distribution entities. But if they want to put the predictive algorithm together of where are people going to buy soda in the future and what other flavors would be congruent to past performance, they would need a company like Alegion to go through huge amounts of data that they already have and apply structure to it. That structure then would organize the data in a format that could be processed by the algorithms. That’s what we do in those cases.

The targeted industries are varied. I would say the targeted function that we’re going after is artificial intelligence initiatives within organizations and within enterprise, larger organizations. The industry has a little less relevance. Some of our largest customers are financial services organizations or government contractors. We have large customers that are media and publishing companies. What bonds them all together is they have high volumes of data that they’re trying to describe in structure. And crowdsourcing in our platform is a fantastic way to do that with an assurance of quality.

Q. How’s the company doing?

A. The company is doing great. We saw about a 10x increase in the 12 months of 2017, and about 80% of that was attributable to artificial intelligence. We’re seeing new companies with larger projects they want to do around the concept of AI training. So this growth wave we are in right now is pretty exciting, and we also know it’s very early. Everyone is still trying to figure out which way is up with artificial intelligence, how it could potentially disrupt their industry, and whether they could be the catalyst for that or whether they will get caught from behind. And so the vast majority of this market is yet to come.

Q. John: Right. So can you describe the AI that you use a little bit?

A. Sure. So we interact with AI in two different spheres. First of all, the customer has their own AI that they’re pursuing. And that’s what they’re trying to train. We don’t have a great deal of visibility into that. So, it might be an insurance company that is building an application where you can take a picture of a car and it could assess the amount of damage to the vehicle. That artificial intelligence is the customer’s responsibility. All they need us to do is to have workers draw squares around the damage, and classify how much damage there is for the training purposes.

Alegion also uses AI machine learning internally to improve our own processes and quality. So we can look at an image and use our own algorithms to determine whether the vehicle has damage. Do we think that photo is appropriate or do we not think it’s appropriate before the human ever sees it? And then as that model gets better and better, we are less dependent upon even our own humans. So we’re improving models in order to drive high-quality training data for our customers. Our customers in turn take that training data and build fantastic new artificial intelligence that’s transformative in their market.

Q. Many companies today are trying to monetize their digital content. Are any companies you are working with doing this?

A. Yes. We work with publishing companies who have a tremendous volume of digital assets, such as magazine photography or newspapers columns and archives. They may want to get that data into syndication so it can be monetized. In order to do that effectively, there has to be the right metadata so that a person searching for that content can find it easily. We might have to tag that with different attributes of a photo, for instance. So a customer would come to us with a specific taxonomy and say, “Please go through our million photos of the last 10 years and apply this taxonomy to it.” And by taxonomy, I mean the structure of the data classification system. That way, later on, when their customer is looking for a specific photo, they can search “man with a green hat” and find all the photos of a man with a green hat that are available for syndication. That’s a way that you can monetize an asset by adding metadata to it so it’s searchable.

Another way to monetize an asset is to be able to offer like or similar assets based off of what the customer is browsing. So we’ll see this a lot in like things like Netflix where you’ve just watched a movie on science fiction, and lo and behold, Netflix would like to see if you’d like to watch this other movie also about science fiction because it has a similar affinity. This requires and depends upon metadata. And that metadata is data that describes the underlying data. So it’s one thing to say, “I have a phone on my desk.” It’s another thing to say, “I have a black Polycom with the handset.” The more metadata that describes it, the more I can link it to other things that have similar affinities. An interesting point is that many customers will come to us for cost savings initiatives. They would like to use artificial intelligence or the crowd to help sell their product at a lower price or make their customer service less expensive.

We have customers with 10 years of recordings for customer service that they want to turn into a chatbot, in order to reduce the required number of customer service people that they need by say 50%. That’s success to them because they’ve reduced their customer service cost without hopefully sacrificing customer experience. But what comes out of it if we do our job right and we are able to classify the data and the conversations well enough, is a revenue-producing opportunities for the customer. The chatbot is able to lean back on the context and the associations that we have identified looking at the historical data to know that in the context of this conversation, this customer could be very well suited to buy a certain product. And the chatbot can actually suggest that.

Human intelligence or a machine intelligence in the exact same way can look at the totality of the data that trained it in the first place, and find a correlation of someone with a similar footprint who was interested in certain services. So even though these customers are typically knocking on our door looking for cost savings, we tend to find them revenue-producing opportunities with artificial intelligence. And that’s what becomes transformative in the space.

“We are an artificial intelligence services company.”

Q. What would you say is Alegion’s differentiation in the market and who is the competition?

A. A small group of us right now are doing the majority of this work; we’re just a tip of the iceberg right now. Our competition includes CrowdFlower, Mighty AI and CloudFactory. These are all companies who use crowdsourcing to build out a fantastic training data for their customers. We would differentiate ourselves from these folks on a couple different fronts. We do a little bit of a service-first approach rather than just giving a customer a command console and say, “Go make your training data.” We would really wrap that usually with a customer success team, and that customer success team would own the onus of quality for the customer. So we would give an SLA [service level agreement] to say, “We will return back to you your data at this level of quality and we will do whatever services are required in addition to our software to get it there.” And we find that customers are very interested in not having to learn another piece of software. They would love just to have their problem solved. And so that’s kind of the one differentiator that you’ll see for us.

A second one is that we are the most suited to offer higher security solutions. Customers, and rightly so, are very nervous about who has access to their data, and in turn, because sometimes their data has customer data in it. So how can I give that data to a company like Alegion who’s gonna go and add metadata to it without putting our data at risk? Alegion has multiple stages of secure data approaches that meet the customer’s internal regulatory or internal governance requirements for security. So Alegion is the secure crowd approach. We can do a number of different things to ensure that that data never leaves the customer site, is never unencrypted, is always audited, and they would be able to pass the different regulatory hurdles as necessary.

Q.  How do you price your product or your service?

A. It comes in a couple different flavors depending on the customer’s requirements. If the customer has a one-and-done project and they just want to train a model for that project, we just give them a price to do that. If the customer has multiple projects that they foresee, then we would sell them a subscription to our software and then we would have an added fee for whatever services of Alegion they use.

Q. Bias is a challenging issue for businesses that want to strategically deploy AI. How do you deal with bias from your perspective?

A. We see a lot of bias and the challenge of bias in our customers. They will have developed data sets, and those data sets often have leveraged some third party to add context. So that could have been interns or that could have been sales folks on their team or maybe they sent it out to an international outsourcing company to add data, add metadata on top of their data. What they don’t understand is that that comes with bias. When you have siloed the people who are interacting with your data, the people who are describing your data are siloed and too like-minded, then they will bring their presuppositions and understandings to bear when they make a judgement. If those are too similar, then you’re going to see a trend that presumes those same presuppositions in judgment. And that’s where you see bias starting to affect your data.

So one of the fantastic fringe benefits of crowdsourcing is we can bring context from multiple countries, from multiple venues, from multiple strata of socio-economic status that will all look at a problem differently. So, for example, we saw in a customer that was building a fashion identification model, they were seeing too many photos end up with the models wearing boots. And that just seemed really unusual. And we went back and looked at the data that they had and found out that the outsourcing that they had derived the data from was from India. And in that case, they called any high-heeled shoe a boot. No one knew that. And so we ended up with that bias of culture driving noise in their data. So guess what? Their AI algorithms started calling every high-heeled shoe a boot. When they handed it to us, we were able to do descriptions from multiple different perspectives, multiple different countries and be able to round out their data set substantially. So bringing in multiple perspectives is a fantastic mechanism for reducing bias in the data.

Q. What can you say about your future plans?

A. It’s pretty exciting. We are an artificial intelligence services company. We’re similar to an oilfield services company. If they say data is the new black, then we’re an oilfield services company for AI. We really feel like we’re positioned well regardless of what the actual ambitions of the companies wanting to utilize AI is going to be. We see the value that we’re offering across most multiple industries and we’re not getting siloed up into specific use cases. We’re not getting siloed up into specific buyer personas. It’s coming from everywhere,

Everyone has an AI project right now. Everyone is trying to figure out how they can monetize their data and how they can turn the decision-making in their organization to be influenced by machine intelligence. So we’re really excited right now. That is the reason customers are calling us right now, and that in turn opens up opportunities for us to hire people. In this AI revolution going on right now, we are hiring people and we are providing jobs here in the US and beyond, in a world where everyone is concerned that the robots are going to take their job. We can’t see an end to the growth of us hiring more people to help build these data sets. Let me just describe the AI lifecycle for you; I think it might be interesting to you.

When a customer first approaches us, we see where they are in their maturity with artificial intelligence. We call that the AI lifecycle, and we describe it like a bridge to artificial intelligence. On one side of the bridge is your manual judgment and your manual processes. And on the far side of the bridge is machine judgment and machine automated processes. And that’s where you want to go. That’s disruptive in your industry and that’s transformative. The companies are on the bridge whether they know it or not. They’re either on the near side or they’re on the far side or they’re somewhere in between. And anywhere they are on the bridge, they can use Alegion, and they can use human intelligence to spur on growth.

So if they’re on the near shore of the bridge, they can use crowdsourcing and Alegion to go out and collect new data or to structure the data that they already have so that they can train a model. And once they have a model, they can use human intelligence and Alegion to validate that model, to score it, to continuously improve it, to see where it needs to get better and where it needs to be improved. And then once the model is doing really well, you start to get this plateau where, “What do I do with the ones the model can’t do? Maybe we’re fine if the model can do 85% of the judgment, but what do I do with the last 15%?” Well, that’s where humans are still there doing exception handling, and then that exception handling judgment that they do still goes back to improve the model. So regardless of where they are on that bridge in moving into AI, Alegion has help for them in the form of humans. That’s the irony of it, right?

Q. So you are hiring people?

A. We’re absolutely hiring people. We’re really excited where AI is going. As much as possible, our goal is to involve humans in those AI projects to ensure high quality and to ensure that we’re providing jobs for plain old human beings for the long term.

For more information, go to Alegion.com.

5 Tips to Turn Your Data to Your Competitive Advantage

The need for competitive advantage sees companies increasingly turning to analytics to operationalize their data. Leveraging analytics from insight to artificial intelligence (AI), business leaders can make sense of their rapidly-growing piles of data to improve their operations. Here are my tips for using analytics to create a measurable business impact. #1: Put the decision […]

The need for competitive advantage sees companies increasingly turning to analytics to operationalize their data. Leveraging analytics from insight to artificial intelligence (AI), business leaders can make sense of their rapidly-growing piles of data to improve their operations. Here are my tips for using analytics to create a measurable business impact.

#1: Put the decision before the data

With a decision-first strategy you define the business objective first, then determine what data and analytics you need to achieve the goal. Extrapolating insights from huge amounts of data can be interesting, but it can also be a tremendous waste of time and resources if it doesn’t solve a specific business challenge. If the modeling and data analytics requirements are defined by the business outcome first, data exploration and analytic development is faster and more productive. This helps enterprises narrow in on meaningful outcomes, shutting out extraneous noise and focus on the insights that address specific objectives.

#2: Get data into decision makers’ hands

Empower the business leaders with the ability to evaluate the complete spectrum of potential opportunities. This requires a combination of insight, advanced analytics and decisioning (prescriptive) to explore, simulate and pressure test scenarios in real-time. To do this, you need user-friendly decision management tools that can be rapidly configured and evolve with the specific needs of the operation. Experience has shown that when business experts have access to the data, insight, and tools to exploit analytics, they can visualize relationships between different variables and actions to quickly identify the preferred outcomes for maximum impact.

#3 AI & machine learning can expand your frontiers

Every decision that is made or action that is taken provides an opportunity to improve. The key is automatically feeding those learnings back into the analytic system to influence the next decision or action. By using decision management tools that incorporate machine learning and artificial intelligence, enterprises can conduct complex analysis that evolves and improves as new scenarios are added. With artificial intelligence and machine learning, you can discover unique insights and meaningful patterns in large volumes of data. Then, add self-learning models that will allow you to adapt quickly to changes in those patterns or take action on those insights. But, to unlock the full business potential, the analytic output must be explainable to a business expert if it is to be understood and accepted.

#4: Keep it open and focus on integration

One of the easiest ways to start a healthy debate about analytics is to pose the question of which tool is the best! The reality is that it depends on what you are trying to accomplish, but even more so who is accomplishing it. One thing is for sure: you most certainly are not starting from scratch and already have technology systems in place. Be certain any further investment is in analytic and decision management tools that are open and can easily integrate with your existing environment. However, the key requirement is to understand how you will eventually use and manage those analytics within your day-to-day operation.

#5 Operationalize the analytics

The real value of analytics comes when they are operationalized. Connecting the data and insights gleaned from advanced analytics to day-to-day operations will tie to positive business outcomes. With prescriptive analytics, you can add business rules or optimization models to the analytics – which will trigger a specific action to be taken in different scenarios based on a deep understanding of the situation, predictions about the future, and other business constraints or regulations.

With these suggestions in mind, business leaders can help move their enterprise to a place where artificial intelligence and human intelligence come together to drive real business outcomes that deliver competitive advantage and better differentiation.

Read the source article in RTInsights.com.

These 4 Data Analytics Trends Will Dominate 2018

Together with AI, social, mobile and cloud, analytics and associated data technologies have emerged as core business disruptors in the digital age. As companies began the shift from being data-generating to data-powered organizations in 2017, data and analytics became the center of gravity for many enterprises. In 2018, these technologies need to start delivering value. […]

Together with AI, social, mobile and cloud, analytics and associated data technologies have emerged as core business disruptors in the digital age. As companies began the shift from being data-generating to data-powered organizations in 2017, data and analytics became the center of gravity for many enterprises. In 2018, these technologies need to start delivering value. Here are the approaches, roles and concerns that will drive data analytics strategies in the year ahead.

Data lakes will need to demonstrate business value or die

Data has been accumulating in the enterprise at a torrid pace for years. The internet of things (IoT) will only accelerate the creation of data as data sources move from web to mobile to machines.

“This has created a dire need to scale out data pipelines in a cost-effective way,” says Guy Churchward, CEO of real-time streaming data platform provider DataTorrent.

For many enterprises, buoyed by technologies like Apache Hadoop, the answer was to create data lakes — enterprise-wide data management platforms for storing all of an organization’s data in native formats. Data lakes promised to break down information silos by providing a single data repository the entire organization could use for everything from business analytics to data mining. Raw and ungoverned, data lakes have been pitched as a big data catch-all and cure-all.

But while data lakes have proven successful for storing massive quantities of data, gaining actionable insights from that data has proven difficult.

“The data lake served companies fantastically well through the data ‘at rest’ and ‘batch’ era,” Churchward says. “Back in 2015, it started to become clear this architecture was getting overused, but it’s now become the Achilles heel for realreal-time data analytics. Parking data first, then analyzing it immediately puts companies at a massive disadvantage. When it comes to gaining insights and taking actions as fast as compute can allow, companies relying on stale event data create a total eclipse on visibility, actions, and any possible immediate remediation. This is one area where ‘good enough’ will prove strategically fatal.”

Monte Zweben, CEO of Splice Machine, agrees.

“The Hadoop era of disillusionment hits full stride, with many companies drowning in their data lakes, unable to get a ROI because of the complexity of duct-taping Hadoop-based compute engines,” Zweben predicts for 2018.

To survive 2018, data lakes will have to start proving their business value, says Ken Hoang, vice president of strategy and alliances at data catalog specialist Alation.

“The new dumping ground of data — data lakes — has gone through experimental deployments over the last few years, and will start to be shut down unless they prove that they can deliver value,” Hoang says. “The hallmark for a successful data lake will be having an enterprise catalog that brings information discovery, AI, and information stewarding together to deliver new insights to the business.”

Read the source article at CIO.com.

Cybersecurity Suppliers Innovating to Bring AI-based Products to Market

If you want to understand what’s happening with artificial intelligence (AI) and cybersecurity, look no further than this week’s news. On Monday, Palo Alto Networks introduced Magnifier, a behavioral analytics solution that uses structured and unstructured machine learning to model network behavior and improve threat detection. Additionally, Google’s parent company, Alphabet, announced Chronicle, a cybersecurity […]

If you want to understand what’s happening with artificial intelligence (AI) and cybersecurity, look no further than this week’s news.

On Monday, Palo Alto Networks introduced Magnifier, a behavioral analytics solution that uses structured and unstructured machine learning to model network behavior and improve threat detection. Additionally, Google’s parent company, Alphabet, announced Chronicle, a cybersecurity intelligence platform that throws massive amounts of storage, processing power, and advanced analytics at cybersecurity data to accelerate the search and discovery of needles in a rapidly growing haystack.

So, cybersecurity suppliers are innovating to bring AI-based cybersecurity products to market in a big way. OK, but is there demand for these types of advanced analytics products and services? Yes. According to ESG research, 12 percent of enterprise organizations have already deployed AI-based security analytics extensively, and 27 percent have deployed AI-based security analytics on a limited basis. These implementation trends will only gain momentum in 2018.

What’s driving AI-based cybersecurity technology adoption? ESG research indicates:

  • 29 percent want to use AI-based cybersecurity technology to accelerate incident detection. In many cases, this means doing a better job of curating, correlating, and enriching high-volume security alerts to piece together a cohesive incident detection story across disparate tools.
  • 27 percent want to use AI-based cybersecurity technology to accelerate incident response. This means improving operations, prioritizing the right incidents, and even automating remediation tasks.
  • 24 percent want to use AI-based cybersecurity technology to help their organization better identify and communicate risk to the business. In this case, AI is used to sort through mountains of software vulnerabilities, configuration errors, and threat intelligence to isolate high-risk situations that call for immediate attention.
  • 22 percent want to use AI-based cybersecurity technology to gain a better understanding of cybersecurity situational awareness. In other words, CISOs want AI in the mix to give them a unified view of security status across the network.

It’s important to point out that in each of these use cases, AI-based solutions don’t operate in a vacuum yet. Rather they provide incremental analytics horsepower to existing technologies, driving greater efficacy, efficiency, and value.

This tends to happen in one of two ways. In some cases, machine learning technologies are applied to existing security defenses as helper apps. For example, Bay Dynamics and Symantec have formed a partnership that applies Bay’s AI engine behind Symantec DLP to help reduce the noise associated with DLP alerts. Fortscale does similar things by back-ending endpoint detection and response (EDR), identity and access management (IAM), cloud access security brokers (CASB), etc.

Alternatively, some AI-based solutions work on a stand-alone basis but are also tightly coupled with the various other technologies of a security operations and analytics platform architecture (SOAPA). Vectra Networks and E8 security are often integrated with SIEM and EDR. Kenna Security works hand in hand with vulnerability scanners. Splunk and Caspida are tightly integrated as are IBM QRadar and Watson.

Read the source article at CSO.

Facebook Launches A Huge Play In ‘Big Data’ Analytics (All you need to know)

At the point when Facebook CEO Mark Zuckerberg took Money Street analysts’ questions about his Q1 2013 earnings a few years ago, there was one topic that continued coming back to Facebook’s new Big Data capabilities. Most individuals think Facebook is a straightforward advertising play: It has a massive group of onlookers of 1 billion […]

The post Facebook Launches A Huge Play In ‘Big Data’ Analytics (All you need to know) appeared first on TheGramHub.

At the point when Facebook CEO Mark Zuckerberg took Money Street analysts’ questions about his Q1 2013 earnings a few years ago, there was one topic that continued coming back to Facebook’s new Big Data capabilities.

Most individuals think Facebook is a straightforward advertising play: It has a massive group of onlookers of 1 billion users, and advertisers can purchase ads focusing on slices of that gathering of people. What could be simpler?

In any case, in the earnings release, three of the six highlights of the quarter was about “Big Data,” the popular thought that the future of showcasing lies in super-profound, super-complex big data analytics instead of the crude energy to convey lots of eyeballs.

Measurement, measurement, measurement

Here are Facebook’s Big Data moves at Q1:

Propelled new advertising products such as Carbon copy Audiences, Oversaw Custom Audiences, and Accomplice Categories, which enable marketers to enhance their focusing on capabilities on Facebook.

Collaborated with Datalogix, Epsilon, Acxiom, and BlueKai to empower marketers to fuse off Facebook purchasing big data keeping in mind the end goal to convey more important ads to users.

Upgraded capacity to measure advertiser return for capital invested in digital media all over the internet through our acquired Atlas Advertising Suite.

The two first key points underline what Facebook is doing. Most individuals have no clue what Datalogix, Epsilon, Acxiom, and Bluekai do. Insiders, in any case, realize that Facebook alliances with these companies give it a standout amongst the most intense consumer databases on the planet.

The big data stored in these companies tend to be anonymized or “hashed.” They are not ready to recognize Jane Smith, shoe-shopper from Montclair, N.J., as a person. Be that as it may, they can recognize thousands of Janes who shop for shoes in any postal district you need, in total.

This is presently combined with Facebook’s particular big data — profiles of 1 billion-plus users who are for the most part joyfully reporting the details of their lives, and their shopping, on Facebook.

Second just to Google

It’s sensational stuff, regarding its scale. However, Zuckerberg indicated that Facebook is still in the infant steps phase of its big data design because of the last piece of the arrangement — Atlas — isn’t even completely connected to yet.

Atlas is a huge internet advertisement server, previously owned by Microsoft. It resembles the pipes of the web: It serves up ads everywhere throughout the internet and also takes cuts from any advertiser using its services. Atlas carries in the vicinity of 10% and 15% of all ads for buyers on the web, as indicated by LeadLedger. It is second in size just to Google’s DoubleClick advertisement server.

Most individuals still can’t seem to digest that reality: Facebook is currently the second biggest web promotion server to Google. The arrangement closed as of late, and Facebook has yet to report the income affect, assuming any, of Atlas in its numbers.

Atlas has yet to shrug.

Many people assumed that Facebook purchased Atlas because it needed to make an off-Facebook promotion network, possibly one in which Facebook big data could be used to improve focus on Atlas. In any case, that is not the essential goal for Atlas. Facebook has been very evident concerning why it procured Atlas from Microsoft: It wants the data Atlas can give.

Zuckerberg also said, on his Q1 call that “Atlas is a significant piece of proceeding to build up our measurement capabilities”: He wants Facebook to have the capacity to tell advertisers how their ads perform notwithstanding when consumers are disconnected and haven’t been anyplace close Facebook before going shopping:

We trust the Atlas stage will enable us to demonstrate significantly more plainly the association between promotion impressions and purchases. We could enable marketers to measure the effectiveness of their advertisement impressions better not just on Facebook, but rather across the whole internet. This means that we can take the improvements we have made in measurements on Facebook, incorporating our integrations with Nielsen and Datalogix, and grow them to a substantially bigger group of onlookers and numerous more purchases.

… Our concentration with Atlas is on impression based ads. What’s more, the thought is that, you know historically a ton of ads online which were more centralized on search, the attribution was always that last snap. Furthermore, as individuals have looked all the more holistically at all the advertisement spending they are doing, whatever they search, that it’s not just the last snap that matters but rather it’s every one of the impressions paving the way to that snap. Essentially, we also drive sales disconnected. Furthermore, disconnected individuals aren’t navigating the purchase of everything except they are strolling into a store. So in other words, there is no last snap.

Thus our focus with Atlas is to take that technology and empower us to enhance our capacity to interface advertisement impressions to purchase conduct both disconnected and on the web, and not just on clicks but rather across various promotion purchases individuals do. So that is precisely why we made that purchase.

Most common Facebook users don’t realize how ambitious these plans are. If you purchased something with a credit or charge card in the last couple of years, you’re most likely in Facebook’s big data pool at present.

The post Facebook Launches A Huge Play In ‘Big Data’ Analytics (All you need to know) appeared first on TheGramHub.

Deeper look into Big Data vs Data Mining vs Data Visualization Tools vs Business Intelligence and their integration

Big Data vs Data Mining vs Data Visualization Tools vs Business Intelligence Big Data Big data is data sets that are so large and also complex that conventional data processing application software is unable to manage them. Big data challenges incorporate data collection, data storage, data analysis, search, sharing, transfer, visualization, questioning, refreshing and data […]

The post Deeper look into Big Data vs Data Mining vs Data Visualization Tools vs Business Intelligence and their integration appeared first on TheGramHub.

Big Data vs Data Mining vs Data Visualization Tools vs Business Intelligence

Deeper look into Big Data vs Data Mining vs Data Visualization Tools vs Business Intelligence and their integration

Big Data

Big data is data sets that are so large and also complex that conventional data processing application software is unable to manage them. Big data challenges incorporate data collection, data storage, data analysis, search, sharing, transfer, visualization, questioning, refreshing and data privacy. There are 3 dimensions of big data which are mainly Volume, Variety, and Velocity.

Recently, the term “big data” tends to allude to the use of predictive analytics, user behavior analytics, or certain other propelled data analytics methods that concentrate an incentive from data, and seldom to a specific size of data set. “There is little uncertainty that the quantities of data now accessible are in reality substantial, however, that is not the most applicable characteristic of this new data ecosystem.

Analysis of data sets can discover new correlations to “spot business trends, forestall diseases, battle crime thus on. “Scientists, business executives, practitioners of pharmaceutical, advertising and governments alike routinely meet difficulties with big data-sets in areas not excluding Internet search, urban informatics, fintech, and business informatics. Scientists experience limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, science and ecological research.

Data Mining

Data mining is the figuring process of discovering patterns in vast data sets including methods at the point of machine learning, database systems and statistics. It is an essential process where intelligent methods are connected to extricate data patterns. It is an interdisciplinary branch of computer science. The general goal of the data mining process is to remove data from a data set and make it go through a transformation process to give a logical structure for further use. Aside from the crude analysis step, it involves database and data administration aspects, data pre-processing, model and derivation considerations, interestingness metrics, multifaceted nature considerations, post-processing of discovered structures, visualization, and online updating. Data mining is the analysis step of the “information discovery in databases” process or KDD.

The term is a misnomer because the goal is the extraction of patterns and information from a lot of data, not the extraction (mining) of data itself. It also is a buzzword and is much of the time connected to any expansive scale data or data processing (gathering, extraction, warehousing, analysis, and statistics) as well as any use of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java (which covers machine learning material mostly) was initially to be named just Practical machine learning, and the term data mining was included for showcasing reasons. Often the more broad terms (vast scale) data analysis and analytics – or, when alluding to genuine methods, artificial intelligence, and machine learning – are more suitable.

Data Visualization

Many disciplines see data visualization or data visualisation as a cutting-edge likeness visual correspondence. It involves the creation and study of the visual representation of data, signifying “data that has been abstracted in some schematic shape, including attributes or variables for the units of information.”

An essential goal of data visualization is to convey data clearly and productively using statistical graphics, plots and data graphics. Numerical data might be encoded using dots, lines, or bars, to impart a quantitative message visually. Effective visualization helps users break down and reason about data and confirmation. It makes complex data more accessible, understandable and usable. Users may have specific logical tasks, such as making comparisons or understanding causality, and the design guideline of the realistic (i.e., showing correlations or showing causality) follows the task. Tables are widely used where users will look into a particular range, while charts of different types are used to display patterns or connections in the data for at least one variables.

Data visualization is both a craft and science. It is viewed as a branch of descriptive statistics by some yet in addition to a grounded hypothesis development tool by others. Increased amounts of data made by Internet movement and a growing number of sensors in the earth are referred to as “big data” or Internet of things. Processing, examining and imparting this data present moral and expository challenges for data visualization. The area of data science and practitioners called data scientists help tackle this challenge.

Business Intelligence

Business intelligence (BI) is defined as the technology-driven process for dissecting data and presenting noteworthy data to help managers executives, and other corporate end users settle on educated business decisions.

Business intelligence vs. data analytics

Sporadic utilization of the term business intelligence could be traced back to as far back as the 1860s, however, Howard Dresner the consultant is credited with first introducing it in 1989 as an umbrella phrase for implementing data analysis techniques to support business decision-production processes. What came to be known as BI tools developed from before, frequently centralized server-based logical systems, such as decision support systems and official data systems.

Business intelligence is sometimes used reciprocally with business analytics; in different cases, business analytics is used either more barely to allude to cutting-edge data analytics or all the more comprehensively to incorporate both BI and progressed analytics.

Why is business intelligence imperative?

The potential benefits of business intelligence tools incorporate quickening and enhancing decision-production, upgrading inner business processes, increasing operational effectiveness, driving new revenues and increasing upper hand over business rivals. BI systems can also enable companies to distinguish showcase trends and spot business problems that should be addressed.

The post Deeper look into Big Data vs Data Mining vs Data Visualization Tools vs Business Intelligence and their integration appeared first on TheGramHub.

10 New Big Data Tools for Data Analysis, Data Visualization, Business Intelligence and Data Mining

When we talk data, a number of things come to mind but data cannot be meaningful to all except when sorted and for this, a number of softwares comes to mind. Not just anyhow softwares can be used but there are unique tools to make this process even more better There are significant numbers of big […]

The post 10 New Big Data Tools for Data Analysis, Data Visualization, Business Intelligence and Data Mining appeared first on TheGramHub.

When we talk data, a number of things come to mind but data cannot be meaningful to all except when sorted and for this, a number of softwares comes to mind. Not just anyhow softwares can be used but there are unique tools to make this process even more better

There are significant numbers of big data tools out there which can be used for data analysis today. Big Data Analysis is the process of cleaning, inspecting, modeling and transforming, big data with the primary purpose and aim of discovering and finding vital and useful data/information, arriving at logical conclusions, and also supporting decision making.

Big Data Tools for Data Analysis, Data Mining, and Data Visualization

1. Knime

Knime - Big Data Tools for Data Analysis, Data Mining and Data Visualization

KNIME Analytics Stage is the main open solution for big data-driven development, helping you discover the potential covered up in your big data, dig for fresh insights, or foresee new futures.

With more than 1000 modules, hundreds of prepared to-run examples, a comprehensive scope of incorporated tools, and the widest decision of cutting-edge algorithms accessible, KNIME Analytics Stage is the ideal toolbox for any big data scientist.

2. OpenRefine

OpenRefine - Big Data Tools for Data Analysis, Data Mining and Data Visualization

OpenRefine (in the past Google Refine) is an effective tool for working with messy big data: cleaning it, transforming it from one organization into another, and broadening it with web services and outer big data. OpenRefine can enable you to investigate vast big data sets easily.

3. R-Programming

R-Programming - Big Data Tools for Data Analysis, Data Mining and Data Visualization
R-Programming – Big Data Tools for Data Analysis, Data Mining and Data Visualization

Imagine a scenario where I reveal to you that Task R, a GNU venture, is composed in R itself. It’s written in C and Fortran. Also, a considerable measure of its modules is written in R itself. It’s a free software programming language and software condition for statistical computing and graphics. The R language is broadly used among big data miners for creating statistical software and excellent big data analysis. Its simplicity to use and its portability has raised R’s prominence substantially lately.

Besides big data mining, it provides statistical and graphical techniques, including direct and time-series analysis, classical statistical tests, nonlinear modeling, clustering, classification, and many others.

4. Orange

Orange - Big Data Tools for Data Analysis, Data Mining and Data Visualization

Orange is open source big data visualization and data analysis for fledgling and master, and provides interactive workflows with a vast toolbox to make interactive workflows to examine and visualize data. Orange is pressed with various data visualizations, from scatter plots, bar charts, trees, to dendrograms, networks and warmth maps.

5. RapidMiner

RapidMiner - Big Data Tools for Data Analysis, Data Mining and Data Visualization

Just like KNIME, RapidMiner works or operates through visual programming and is well equipped for controlling, investigating and modeling data. RapidMiner makes data science teams more profitable through an open source stage for data prep, machine learning, and model organization. Its brought together data science stage accelerates the working of finish analytical workflows – from big data prep to machine learning to model approval to arrangement – in a single situation, significantly enhancing proficiency and shortening the time to an incentive for data science projects.

6. Pentaho

Pentaho - Big Data Tools for Data Analysis, Data Mining and Data Visualization

Pentaho addresses the barriers that piece your organization’s capacity to get an incentive from every one of your data. The stage simplifies planning and mixing any big data and includes a spectrum of tools to easily break down, visualize, investigate, report and anticipate. Open, extensible, embeddable and Pentaho is built to make sure that every colleague from developers to the business users — can translate big data easily how ever they want it.

7. Talend

Talend - Big Data Tools for Data Analysis, Data Mining and Data Visualization

Talend is the main open source coordination software supplier to data-driven enterprises. Our customers interface anyplace, at any speed. From ground to cloud and clump to streaming, big data or application incorporation, Talend connects at big data scale, 5x faster and at 1/fifth the cost.

8. Weka

Weka - Big Data Tools for Data Analysis, Data Mining and Data Visualization
Weka – Big Data Tools for Data Analysis, Data Mining and Data Visualization

Weka, an open source software, is a gathering of machine learning algorithms for data mining tasks. The algorithms can either be connected specifically to a data set or called from your own particular JAVA code. It is also appropriate for growing new machine learning schemes since it was completely built with JAVA programming language and also supporting several standard data mining tasks.

For people who have not coded for some time, Weka with its GUI provides easiest transition into the world of Data Science. It is written in Java and those with a little or more Java experience can call and make use of the library in their code as well.

9. NodeXL

NodeXL - Big Data Tools for Data Analysis, Data Mining and Data Visualization

NodeXL is a data visualization and analysis software of relationships and networks. NodeXL provides correct calculations. It is free (not the expert one) and open-source network analysis and data visualization software. It is a standout amongst other statistical tools for data analysis which includes propelled network metrics, access to social media network data importers, and automation.

10. Gephi

Gephi - Big Data Tools for Data Analysis, Data Mining and Data Visualization

Gephi is also an open-source network analysis and data visualization software bundle written in Java on the NetBeans stage. Think about the goliath friendship maps you see that represent LinkedIn or Facebook connections. Gelphi takes that a step encourages by giving accurate calculations.

The post 10 New Big Data Tools for Data Analysis, Data Visualization, Business Intelligence and Data Mining appeared first on TheGramHub.