Entrepreneurs Taking on Bias in Artificial Intelligence

Whether it’s a navigation app such as Waze, a music recommendation service such as Pandora or a digital assistant such as Siri, odds are you’ve used artificial intelligence in your everyday life. “Today 85 percent of Americans use AI every day,” says Tess Posner, CEO of AI4ALL. AI has also been touted as the new must-have […]

Whether it’s a navigation app such as Waze, a music recommendation service such as Pandora or a digital assistant such as Siri, odds are you’ve used artificial intelligence in your everyday life.

“Today 85 percent of Americans use AI every day,” says Tess Posner, CEO of AI4ALL.

AI has also been touted as the new must-have for business, for everything from customer service to marketing to IT. However, for all its usefulness, AI also has a dark side. In many cases, the algorithms are biased.

Some of the examples of bias are blatant, such as Google’s facial recognition tool tagging black faces as gorillas or an algorithm used by law enforcement to predict recidivism disproportionately flagging people of color. Others are more subtle. When Beauty.AI held an online contest judged by an algorithm, the vast majority of “winners” were light-skinned. Search Google for images of “unprofessional hair” and the results you see will mostly be pictures of black women (even searching for “man” or “woman” brings back images of mostly white individuals).

While more light has been shined on the problem recently, some feel it’s not an issue addressed enough in the broader tech community, let alone in research at universities or the government and law enforcement agencies that implement AI.

“Fundamentally, bias, if not addressed, becomes the Achilles’ heel that eventually kills artificial intelligence,” says Chad Steelberg, CEO of Veritone. “You can’t have machines where their perception and recommendation of the world is skewed in a way that makes its decision process a non-sequitur from action. From just a basic economic perspective and a belief that you want AI to be a powerful component to the future, you have to solve this problem.”

As artificial intelligence becomes ever more pervasive in our everyday lives, there is now a small but growing community of entrepreneurs, data scientists and researchers working to tackle the issue of bias in AI. I spoke to a few of them to learn more about the ongoing challenges and possible solutions.

Cathy O’Neil, founder of O’Neil Risk Consulting & Algorithmic Auditing

Solution: Algorithm auditing

Back in the early 2010s, Cathy O’Neil was working as a data scientist in advertising technology, building algorithms that determined what ads users saw as they surfed the web. The inputs for the algorithms included innocuous-seeming information like what search terms someone used or what kind of computer they owned.

Cathy O’Neil, founder of O’Neil Risk Consulting & Algorithmic Auditing

However, O’Neil came to realize that she was actually creating demographic profiles of users. Although gender and race were not explicit inputs, O’Neil’s algorithms were discriminating against users of certain backgrounds, based on the other cues.

As O’Neil began talking to colleagues in other industries, she found this to be fairly standard practice. These biased algorithms weren’t just deciding what ads a user saw, but arguably more consequential decisions, such as who got hired or whether someone would be approved for a credit card. (These observations have since been studied and confirmed by O’Neil and others.)

What’s more, in some industries — for example, housing — if a human were to make decisions based on the specific set of criteria, it likely would be illegal due to anti-discrimination laws. But, because an algorithm was deciding, and gender and race were not explicitly the factors, it was assumed the decision was impartial.

“I had left the finance [world] because I wanted to do better than take advantage of a system just because I could,” O’Neil says. “I’d entered data science thinking that it was less like that. I realized it was just taking advantage in a similar way to the way finance had been doing it. Yet, people were still thinking that everything was great back in 2012. That they were making the world a better place.”

O’Neil walked away from her adtech job. She wrote a book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracyabout the perils of letting algorithms run the world, and started consulting.

Eventually, she settled on a niche: auditing algorithms.

“I have to admit that it wasn’t until maybe 2014 or 2015 that I realized this is also a business opportunity,” O’Neil says.

Right before the election in 2016, that realization led her to found O’Neil Risk Consulting & Algorithmic Auditing (ORCAA).

“I started it because I realized that even if people wanted to stop that unfair or discriminatory practices then they wouldn’t actually know how to do it,” O’Neil says. “I didn’t actually know. I didn’t have good advice to give them.” But, she wanted to figure it out.

So, what does it mean to audit an algorithm?

Four Suggestions for Using a Kaggle Competition to Test AI in Business

According to a McKinsey report, only 20% of companies consider themselves adopters of AI technology while 41% remain uncertain about the benefits that AI provides. Considering the cost of implementing AI and the organizational challenges that come with it, it’s no surprise that smart companies seek ways to test the solutions before implementing them and get […]

According to a McKinsey report, only 20% of companies consider themselves adopters of AI technology while 41% remain uncertain about the benefits that AI provides. Considering the cost of implementing AI and the organizational challenges that come with it, it’s no surprise that smart companies seek ways to test the solutions before implementing them and get a sneak peek into the AI world without making a leap of faith.

That’s why more and more organizations are turning to data science competition platforms like Kaggle, CrowdAI and DrivenData. Making a data science-related challenge public and inviting the community to tackle it comes with many benefits:

  • Low initial cost – the company needs only to provide data scientists with data, pay the entrance fee and fund the award. There are no further costs.
  • Validating results – participants provide the company with verifiable, working solutions.
  • Establishing contacts – A lot of companies and professionals take part in Kaggle competitions. The ones who tackled the challenge may be potential vendors for your company.
  • Brainstorming the solution – data science is a creative field, and there’s often more than one way to solve a problem. Sponsoring a competition means you’re sponsoring a brainstorming session with thousands of professional and passionate data scientists, including the best of the best.
  • No further investment or involvement – the company gets immediate feedback. If an AI solution is deemed efficacious, the company can move forward with it and otherwise end involvement in funding the award and avoid further costs.

While numerous organizations – big e-commerce websites and state administrations among them – sponsor competitions and leverage the power of data science community, running a competition is not at all simple. An excellent example is the competition the US National Oceanic and Atmospheric Administration sponsored when it needed a solution that would recognize and differentiate individual right whales from the herd. Ultimately, what proved the most efficacious was the principle of facial recognition, but applied to the topsides of the whales, which were obscured by weather, water and the distance between the photographer above and the whales far below. To check if this was even possible, and how accurate a solution may be, the organization ran a Kaggle competition, which deepsense.ai won.

Having won several such competitions, we have encountered both brilliant and not-so-brilliant ones. That’s why we decided to prepare a guide for every organization interested in testing potential AI solutions in Kaggle, CrowdAI or DrivenData competitions.

Recommendation 1. Deliver participants high-quality data

The quality of your data is crucial to attaining a meaningful outcome. Minus the data, even the best machine learning model is useless. This also applies to data science competitions: without quality training data, the participants will not be able to build a working model. This is a great challenge when it comes to medical data, where obtaining enough information is problematic for both legal and practical reasons.

  • Scenario: A farming company wants to build a model to identify soil type from photos and probing results. Although there are six classes of farming soil, the company is able to deliver sample data for only four. Considering that, running the competition would make no sense – the machine learning model wouldn’t be able to recognize all the soil types.

Advice: Ensure your data is complete, clear and representative before launching the competition.

Recommendation 2. Build clear and descriptive rules

Competitions are put together to achieve goals, so the model has to produce a useful outcome. And “useful” is the point here. Because those participating in the competition are not professionals in the field they’re producing a solution for, the rules need to be based strictly on the case and the model’s further use. Including even basic guidelines will help them to address the challenge properly. Lacking these foundations, the outcome may be right but totally useless.

  • Scenario: Mapping the distribution of children below the age of 7 in the city will be used to optimize social, educational and healthcare policies. To make the mapping work, it is crucial to include additional guidelines in the rules. The areas mapped need to be bordered by streets, rivers, rail lines, districts and other topographical obstacles in the city. Lacking these, many of the models may map the distribution by cutting the city into 10-meter widths and kilometer-long stripes, where segmentation is done but the outcome is totally useless due to the lack of proper guidelines in the competition rules.

Advice: Think about usage and include the respective guidelines within the rules of the competition to make it highly goal-oriented and common sense driven.

Read the source article at deepsense.ai.

Fluid Data Strategy Needed to Keep Tech Mapped to Business Plan

By Mahesh Lalwani, Vice President, Head of Data & Cognitive Analytics at Mphasis In today’s world, it should no longer be acceptable to have merely adaptive data. To win customers and market share, an organization must do far more and predict which strategy will unlock the potential its data has to offer. A company must envision […]

By Mahesh Lalwani, Vice President, Head of Data & Cognitive Analytics at Mphasis

In today’s world, it should no longer be acceptable to have merely adaptive data. To win customers and market share, an organization must do far more and predict which strategy will unlock the potential its data has to offer. A company must envision how it will compete against today’s known players and future disruptors. Additionally, it needs to anticipate how government rules and regulations will affect its playing field, and it must protect its brand in hostile environments.

Ask any CIO or CDO and they will tell you that it’s fairly complex.

To move an organization onto a more advanced plan of action, CIOs and other executives can think of data strategy in the simple terms of business drivers and technology enablers and how to constantly evolve it. Automation is a business driver that commonly prompts companies to consider new data strategies. As the imperative to run leaner operations grows, enterprises find it valuable to automate business processes to help expedite work that ordinarily takes up long periods of time. A fluid data strategy allows a business to mine the information on how a certain manual function was done in order to automate it. A common tech enabler that actualizes this transformation is Artificial Intelligence (AI). Mimicking the way the human mind works, tools enabled by AI can gather the needed data and build a prototype of the tasks that are to be automated.

Figure-1 illustrates some of the drivers that can shape your data strategy. On the vertical axis, it shows innovation versus risks and regulations, and on the horizontal, centralized IT versus business users, because they inhibit opposite priorities in most cases. The business drivers in the upper half help you increase top and bottom lines, whereas the lower half keeps you from paying hefty fines for non-compliance. The right half represents the priority of your business users and lines of businesses, and the left half is what keeps your centralized IT occupied.

Based on this situation, budget, resources, and predicted future needs, the recommendation would be to focus on just a few interconnected drivers for the next six months. As part of the data strategy, the organization would establish the selected drivers as business goals, allocate specific budgets, bring together teams that understand the impacted systems and processes, and define how to measure success and monitor progress.

In another example, suppose an organization wants to reduce costs through lean IT and create new products based on data insights. At the same time, they also want to identify technologies to enable this new data strategy for the next six months. Figure-2 indicates that the organization may want to focus on the creation of dashboards to show how its products and revenues stack up, along with the building of data lakes and automating of data ingestion from upstream sources. One will help identify strengths and gaps in offerings, and the other will create a platform for the future.

Redefining Data Strategy: The Holy Grail of Marketing

Once it has successfully achieved these goals, an organization may want to redefine its data strategy to take up more challenging goals such as the holy grail of marketing: a “cradle-to-grave” lifecycle journey. That will require allocating new budgets, adding experienced marketing analysts and data scientists to the team, and ingesting new datasets into the data lake from Web analytics, marketing automation, and CRM systems, among others.

With time, an organization can learn to (a) strike a balance between competing priorities and (b) keeping all teams in sync to  achieve new goals every few months as part of fluid data strategy and (c)  monitoring progress frequently. It can become a champion at predicting and defining the right drivers and selecting suitable technology enablers from the likes of Figure-1 and Figure-2 to create custom, fluid shapes outlining an organization’s Agile data strategy.

The new trends observed in the data landscape that will guide organizations in refining their data strategy are indicated in Figure-3.

Business Intelligence

Most business intelligence today is backward-looking and obsolete. Data science and AI give you the tools to mine your data and build models that accurately predict the future. Data science uncovers insights that are otherwise extremely difficult, if not impossible, to achieve. AI helps automate decision-making based on learning.

Data Warehousing

The role of data warehousing has been extended to include data lakes, saving cost and offering the flexibility of the cloud. Data lakes can help reduce computing and storage requirements and costs by ingesting raw data from the data warehouse, performing ETL, and returning aggregates to the data warehouse allowing existing downstream applications to work without any change.

Traditional Master Data Management (MDM)

Most traditional MDM initiatives are starting to be seen as never-ending and as providing little, if any, value. Instead, Agile MDM has emerged as far more productive and useful, with use-case specific minimum viable data, automated data quality improvements, and reference data updates with AI in the data pipeline.

Single Version of Truth

Most organizations have considered creating a single version of truth for some of their enterprise datasets. A few resourceful companies have even used semantic modeling to bring different versions closer. A better approach though, involves having a single source of truth but allowing many versions of truth. For instance, how many customers paid for a particular movie stream will likely differ from how many customers watched it in a given month. The first number is of interest to the accounts department and the second one to marketing, so while they both represent different versions of the truth they originate from a single source of data.

Batch and Files

One more trend we all have seen is the use of real-time streams instead of batches and files. Data’s value decreases quickly with time, so it is best to analyze it in-flight before storing it. Also the more we store, the more data debt (what we need to analyze) we collect. Most of the time, it makes sense to reduce or throw away the unimportant raw data and store only compact summarized or aggregate data, which should be made available as a service to other systems harnessing more value from your data.

In summation, all businesses clearly stand to gain from adopting what can be called a fluid data strategy. Such an approach gives enterprises the flexibility to pick only those business drivers and tech enablers that are relevant to their business plan. It also provides companies with the room to come back and review their choices every couple of months to tweak and rethread their strategy according to new trends and goals.

Read the source article at ITProPortal.

 

Here are the Top 5 Languages for Machine Learning, Data Science

Careers in data science, artificial intelligence, machine learning, and related technologies are considered among the best choices to pursue in an uncertain future economy where many jobs may end up automated and performed by robots and AI. Yet in spite of the likely strong and secure future of these careers, the job marketplace remains fundamentally […]

Careers in data science, artificial intelligence, machine learning, and related technologies are considered among the best choices to pursue in an uncertain future economy where many jobs may end up automated and performed by robots and AI.

Yet in spite of the likely strong and secure future of these careers, the job marketplace remains fundamentally unbalanced. There are still many more jobs open and available than there are qualified applicants to fill those jobs. Just do a search on Monster for the keyword machine learning and you will find thousands of job openings across the country.

Whether you are just starting out in your IT career or you are watching high-profile IT layoffs and considering the best new skills to learn, chances are you are wondering what the best skills are to emphasize on your LinkedIn profile and the best skills to focus on in the next online course you take. What programming language is the most likely to secure your future?

Through our regular discussions with executives, recruiters, and practitioners in the field, we’ve come up with a short list for you. You may already have one or more of these skills. Maybe you are wondering about the best one to learn next. Here’s our list. If you see one that you think we missed, please let us know in the comments section.

R

R remains one of the top languages for data science. First developed in the 1990s, this open source language has its roots in statistics, data analysis, and data visualization. In recent years it’s become the choice of a new generation of analysts who have who have appreciated the active open source community, the fact that they can download the software for free, and the downloadable packages that are available to customize the tool. Tech giant Microsoft has also embraced the platform acquiring Revolution Analytics, a commercially supported enterprise platform for R, in 2015.

Java

Java has also been around since the early 1990s, and back then was famous for its “write once, run anywhere” design, originating inside Sun Microsystems. Sun may no longer exist, having been acquired by Oracle, but Java seems here to stay, and it’s one of the languages you will likely encounter in your career as a machine learning specialist. Many of the machine learning job description ads out there specify Java as one of the languages they’d like for you to know. Chances are if you’ve been in development at all over the last 20 years, you’ve acquired a little bit of experience with Java. And if you feel like you need a little more hands-on experience, it’s pretty easy to find an online course.

Scala

Scala is another language that has been popular with data scientists and machine learning specialists. You’ll see this one mentioned most often in job ads where real-time data analysis is important to the role. It is an implementation language of technologies that enable streaming data, such as Spark and Kafka. Scala combines functional and object-oriented programming and works with both Java and Javascript.

C and C++

These languages have also been around for decades, and you may see them mentioned in machine learning job ads in the same sentence as some other more popular languages for machine learning. Organizations may be looking to add machine learning to existing projects that were built in these languages and so they may be looking for this kind of expertise. But if you are looking for a first language to learn for use with machine learning, it’s probably not one of these.

Python

Right now, Python is probably the top language to learn if you are looking to skill up in areas around machine learning. Just check out online machine learning courses that are available today. Chances are the one you pick will be using Python as the language of choice.

You’ll also find that Python is probably the top named language skill in job ads for machine learning specialists, and certainly also mentioned in many ads for data scientists and analysts, too. If you have to choose one skill to learn this year, Python is a great choice.

Read the source article in InformationWeek.

Accenture: Most Health Organizations Can’t Ensure Responsible AI Use

Despite a growing interest in artificial intelligence, most healthcare organizations still lack the tools necessary to ensure responsible use of such technologies, finds a report from Accenture Health. According to the report, Digital Health Technology Vision 2018, 81% of healthcare executives said they are not yet prepared to face the societal and liability issues needed to explain […]

Despite a growing interest in artificial intelligence, most healthcare organizations still lack the tools necessary to ensure responsible use of such technologies, finds a report from Accenture Health.

According to the report, Digital Health Technology Vision 2018, 81% of healthcare executives said they are not yet prepared to face the societal and liability issues needed to explain their AI systems’ decisions. Additionally, while 86% of respondents said that their organizations are using data to drive automated decision-making, the same proportion (86%) report they have not invested in the capabilities needed to verify data sources across their most critical systems.

Kaveh Safavi, head of Accenture’s health practice, observed that the current lack of AI data verification investment activity is exposing healthcare organizations to inaccurate, manipulated and biased data that can lead to corrupted insights and skewed results. “The 86% figure is critical,” he stated, “given that 24% of executives also said that they have been the target of adversarial AI behaviors, such as falsified location data or bot fraud on more than one occasion.” On a positive note, the study found that 73% of respondents plan to develop internal ethical standards for AI to ensure that their systems act responsibly.

Kaveh Safavi of Accenture

As a growing number of AI-powered healthcare tools enter the market, hospitals, clinics and other healthcare organizations are using intelligent technologies in various ways to become more agile, productive and collaborative. “Until recently, AI was mainly used as a back-end tool but is increasingly becoming part of the everyday consumer and clinician experience,” Safavi noted.

AI’s ability to sense, understand, act and learn enables it to augment human activity by supporting, or even taking over, tasks that bridge administrative and clinical healthcare functions — from risk analysis to medical imaging to supporting human judgment. “In terms of value and cost-savings, there are many ways in which AI can improve and change healthcare,” Safavi observed. Accenture estimates that key clinical health AI applications can create $150 billion in annual savings for the U.S. healthcare economy by 2026.

Moving toward greater compliance

Healthcare organizations recognize the value of AI not only for its potential cost savings, but also for its ability to tackle entrenched issues related to sustainability and access. Adopters are also hoping that IT will help them address the growing healthcare workforce shortage and the increasing dissatisfaction of healthcare consumers. “AI can help increase productivity and personalization in healthcare in ways that few other technologies can,” Safavi explained.

Ultimately, healthcare organizations will need to turn to AI-powered automation to improve a wide range of services. “Because of this, the market will help drive compliance over time as trust both from consumers and clinicians is the only way to truly foster adoption,” Safavi predicted.

Read the source article in Information Week.

How to Land a Data Scientist Job at Your Dream Company — My Journey to Airbnb

By Kelly Peng, Data Scientist, Airbnb I just started my new job at Airbnb as a data scientist a month ago, and I still feel that I’m too lucky to be here. Nobody knows how much I wanted to join this company — I had pictures of Airbnb office stuck in front of my desk; I had […]

By Kelly Peng, Data Scientist, Airbnb

I just started my new job at Airbnb as a data scientist a month ago, and I still feel that I’m too lucky to be here. Nobody knows how much I wanted to join this company — I had pictures of Airbnb office stuck in front of my desk; I had my iPhone wallpaper set as a photo of me standing in front of the Airbnb logo; I applied for positions at Airbnb four times, but only heard back from the recruiter in the last time…

Kelly Peng, data scientist, Airbnb

In the past, when people asked me, “Which company do you want to work for the most?” I dare not to say “Airbnb” because when I said that, they replied to me, “Do you know how many people want to work there? How many of them eventually got in? Come on, be realistic.”

The result proves that nothing is impossible. Since many of my friends asked me to share about my job search experience, I think it might be helpful to write it down and share with people.

Some data…

To provide an overview of my job search process:

  • Applications: 475
  • Phone interviews: 50
  • Finished data science take-home challenges: 9
  • Onsite interviews: 8
  • Offers: 2
  • Time spent: 6 months

As you probably can see from the data, I’m not a strong candidate because, otherwise, I would just apply for a few positions and receive a bunch of offers. Yes, I used to be super weak; I used to be the kind of candidates who are wasting interviewers’ time. But “who you were months ago doesn’t matter, what matters is who you are becoming.”

The road less traveled to a data scientist job

A little bit about my background, I obtained a Bachelor’s degree in Economics from a university in China and a Master’s degree in Business Administration from the University of Illinois at Urbana-Champaign. After graduated, I worked as a data analyst for two years, with 7 months as a contractor at Google, and another 1 year 4 months at a startup. My work was mostly about writing SQL queries, building dashboards, and give data-driven recommendations.

After realizing that I was not learning and growing as expected, I left my job and applied for Galvanize Data Science Immerse program, which is a 12-week boot camp in San Francisco. I failed the statistics interview to enter the boot camp program for 4 times, got admitted after taking the statistics interview for the fifth time.

The content taught at Galvanize was heavy on Python and machine learning, and they assume you already have a strong foundation in statistics. Unsurprisingly, I struggled a lot in the beginning, because I didn’t know much about programming, nor was I strong in statistics. I had no choice but to work really hard. During my time at Galvanize, I had no break, no entertainment, no dating, nothing else but more than 12 hours study every day. And I got much more comfortable with the courses later on.

However, I still embarrassed myself for uncountable times in interviews when I first started the job search process. Because the gap between a real data scientist and I was so huge that even though I was hardworking, the 12-week study was far from enough to make a career transformation. So I applied, interviewed, failed, applied again, interviewed again, failed again. The good thing is, each time I got to learn something new, and became a little bit stronger.

In March 2018, I have been unemployed for almost a year since I quitted my previous job. With only ~$600 in my bank account, I had no idea how to pay for the next month’s rent. What’s even worse, if I couldn’t find a job by the end of April 2018, I have to leave the U.S. because my visa will expire.

Luckily, after so much practice and repetition, I’ve grown from someone who doesn’t know how to introduce herself properly, doesn’t remember which one of Lasso and Ridge is L1, knows nothing about programming algorithms, into someone who knows she is ready to get what she wants.

When I entered the final interview at Airbnb, I had one data scientist offer in hand; thus, I was not nervous at all. My goal for the final interview was to be the best version of myself and leave no regret. The interview turned out to be the best one I have ever had. They gave me the offer, and all the hard work and sleepless nights paid off.

Read the source blog post at Toward Data Science.

Pre-built Analytic Modules Will Drive AI Revolution in Industry

By Bill Schmarzo, CTO, Big Data Practice of EMC Global Services What is the Intelligence Revolution equivalent to the 1/4” bolt? I asked this question in the blog “How History Can Prepare Us for Upcoming AI Revolution?” when trying to understand what history can teach us about technology-induced revolutions.  One of the key capabilities of the […]

By Bill Schmarzo, CTO, Big Data Practice of EMC Global Services

What is the Intelligence Revolution equivalent to the 1/4” bolt?

I asked this question in the blog “How History Can Prepare Us for Upcoming AI Revolution?” when trying to understand what history can teach us about technology-induced revolutions.  One of the key capabilities of the Industrial and Information revolutions was the transition from labor-intensive, hand-crafted to mass manufactured solutions.  In the Information Revolution, it was the creation of standardized database management systems, middleware and operating systems.  For the Industrial Revolution, it was the creation of standardized parts – like the ¼” bolt – that could be used to assemble versus hand-craft solutions. So, what is the ¼” bolt equivalent for the AI Revolution?  I think the answer is Analytic engines or modules!

Analytic Modules are pre-built engines – think Lego blocks – that can be assembled to create specific business and operational applications.  These Analytics Modules would have the following characteristics:

  • pre-defined data input definitions and data dictionary (so it knows what type of data it is ingesting, regardless of the origin of the source system).
  • pre-defined data integration and transformation algorithms to cleanse, align and normalize the data.
  • pre-defined data enrichment algorithms to create higher-order metrics (e.g., reach, frequency, recency, indices, scores) necessitated by the analytic model.
  • algorithmic models (built using advanced analytics such as predictive analytics, machine learning or deep learning) that takes the transformed and enriched data, runs the algorithmic model and generates the desired outputs.
  • layer of abstraction (maybe using Predictive Model Markup Language or PMML[1]) above the Predictive Analytics, Machine Learning and Deep Learning frameworks that allows application developers to pick/use their preferred or company mandated standards.
  • orchestration capability to “call” the most appropriate machine learning or deep learning framework based upon the type of problem being addressed. See Keras, which is a high-level neural networks API, written in Python and capable of running on top of popular machine learning frameworks such as TensorFlow, CNTK, or Theano.
  • pre-defined outputs (API’s) that feeds the analytic results to the downstream operational systems (e.g., operational dashboards, manufacturing, procurement, marketing, sales, support, services, finance).
  • Analytic Modules produce pre-defined analytic results or outcomes, while providing a layer of abstract that enables the orchestration and optimization of the underlying machine learning and deep learning frameworks.
    Monetizing IOT with Analytic Modules

    The BCG Insights report titled “Winning in IoT: It’s All About the Business Processes” highlighted the top 10 IoT use cases that will drive IoT spending including predictive maintenance, self-optimized production, automated inventory management, fleet management and distributed generation and storage (see Figure 1).

    But these IoT applications will be more than just reports and dashboards that monitor what is happening. They’ll be “intelligent” – learning with every interaction to predict what’s likely to happen and prescribe corrective action to prevent costly, undesirable and/or dangerous situations – and the foundation for an organization’s self-monitoring, self-diagnosing, self-correcting and self-learningtwoIoT environment.

    While this is a very attractive list of IoT applications to target, treating any of these use cases as a single application is a huge mistake.  It’s like the return of the big bang IT projects of ERP, MRP and CRM days, where tens of millions of dollars are spent in hopes that two to three years later, something of value materializes.

    Instead, these IoT “intelligent” applications will be comprised of analytic modules integrated to address the key business and operational decisions that these IoT intelligent applications need to address.  For example, think of Predictive maintenance as comprised of an assembly of analytic modules addressing the following predictive maintenance decisions including:

    • identifying At-risk component failure prediction.
    • optimizing resource scheduling and staffing.
    • matching Technician and Inventory to the maintenance and repair work to be done.
    • ensuring tools and repair equipment availability.
    • ensuring First-time-fix optimization.
    • optimizing Parts and MRO inventory.
    • predicting Component fixability.
    • optimizing the Logistics of parts, tools and technicians.
    • leveraging Cohorts analysis to improve service and repair predictability.
    • leveraging Event association analysis to determine how weather, economic and special events impact device and machine maintenance and repair needs.

    As I covered in the blog “The Future Is Intelligent Apps,” the only way to create intelligent applications is to have a methodical approach that starts the predictive maintenance hypothesis development process with the identification, validation, valuing and prioritizing of the decisions (or use cases) that comprise these intelligent applications.

    Read the source article in Data Science Central.

Data Science on a Budget: Audubon’s Advanced Analytics

On Memorial Day weekend 2038, when your grandchildren visit the California coast, will they be able to spot a black bird with a long orange beak called the Black Oystercatcher? Or will that bird be long gone? Will your grandchildren only be able to see that bird in a picture in a book or on […]

On Memorial Day weekend 2038, when your grandchildren visit the California coast, will they be able to spot a black bird with a long orange beak called the Black Oystercatcher? Or will that bird be long gone? Will your grandchildren only be able to see that bird in a picture in a book or on a website?

A couple of data scientists at the National Audubon Society have been examining the question of how climate change will impact where birds live in the future, and the Black Oystercatcher has been identified as a “priority” bird — one whose range is likely to be impacted by climate change.

How did Audubon determine this? It’s a classic data science problem.

First, consider birdwatching itself, which is pretty much good old-fashioned data collection. Hobbyists go out into the field, identify birds by species and gender and sometimes age, and record their observations on their bird lists or bird books, and more recently on their smartphone apps.

Audubon itself has sponsored an annual crowdsourced data collection event for more than a century — the Audubon Christmas Bird Count — providing the organization with an enormous dataset of bird species and their populations in geographies across the country at specific points in time. The event is 118 years old and one of the longest data sets for birds in the world

That’s one of the data sets that Audubon used in its project that looks at the impact of climate change on bird species’ geographical ranges, according to Chad Wilsey, director of conservation science at Audubon. He spoke with InformationWeek in an interview. Wilsey is an ecologist, and not trained as a data scientist. But like many scientists, he uses data science as part of his work. In this case, as part of a team of two ecologists, he applied statistical modeling using technologies such as R to multiple data sets to create the predictive models for future geographical ranges for specific bird species. The results are published in the 2014 report, Audubon’s Birds and Climate Change. Audubon also published interactive ArcGIS maps of species and ranges to its website.

The initial report used Audubon’s Christmas bird count data set and the North American Breeding Bird Survey from the US government. The report assessed geographic range shifts through the end of the century for 588 North American bird species during both the summer and winter seasons under a range of future climate change scenarios. Wilsey’s team built models based on climatic variables such as historical monthly temperature and precipitation averages and totals. The team built models using boosted regression trees and machine learning. These models were built with bird observations and climate data from 2000 to 2009 and then evaluated with data from 1980 to 1999.

“We write all our own scripts,” Wilsey told me. “We work in R. It is all machine learning algorithms to build these statistical models. We were using very traditional data science models.”

Audubon did all this work on an on-premises server with 16-CPUs and 128 gigabytes of RAM.

Read the source article in InformationWeek.com.

Here are 3 Tips to Reduce Bias in AI-Powered Chatbots

AI-powered chatbots that use natural language processing are on the rise across all industries. A practical application is providing dynamic customer support that allows users to ask questions and receive highly relevant responses. In health care, for example, one customer may ask “What’s my copay for an annual check-up?” and another may ask “How much does seeing […]

AI-powered chatbots that use natural language processing are on the rise across all industries. A practical application is providing dynamic customer support that allows users to ask questions and receive highly relevant responses. In health care, for example, one customer may ask “What’s my copay for an annual check-up?” and another may ask “How much does seeing the doctor cost?” A smartly trained chatbot will understand that both questions have the same intent and provide a contextually relevant answer based on available data.

What many people don’t realize is that AI-powered chatbots are like children: They learn by example. Just like a child’s brain in early development, AI systems are designed to process huge amounts of data in order to form predictions about the world and act accordingly. AI solutions are trained by humans and synthesize patterns from experience. However, there are many patterns inherent in human societies that we don’t want to reinforce — for example, social biases. How do we design machine learning systems that are not only intelligent but also egalitarian?

Social bias is an increasingly important conversation in the AI community, and we still have a lot of work to do. Researchers from the University of Massachusetts recently found that the accuracy of several common NLP tools was dramatically lower for speakers of “non-standard” varieties of English, such as African American Vernacular English (AAVE). Another research group, from MIT and Stanford, reported that three commercial face-recognition programs demonstrated both skin-type and gender biases, with significantly higher error rates for females and for individuals with darker skin. In both of these cases, we see the negative impact of training a system on a non-representational data set. AI can learn only as much as the examples it is exposed to — if the data is biased, the machine will be as well.

Bots and other AI solutions now assist humans with thousands of tasks across every industry, and bias can limit a consumer’s access to critical information and resources. In the field of health care, eradicating bias is critical. We must ensure that all people, including those in minority and underrepresented populations, can take advantage of tools that we’ve created to save them money, keep them healthy, and help them find care when they need it most.

So, what’s the solution? Based on our experience of training with IBM Watson for more than four years, you can minimize bias in AI applications by considering the following suggestions:

  • Be thoughtful about your data strategy;
  • Encourage a representational set of users; and
  • Create a diverse development team.
1. Be thoughtful about your data strategy

When it comes to training, AI architects have choices to make. The decisions are not only technical, but ethical. If our training examples aren’t representative of our users, we’re going to have low system accuracy when our application makes it to the real world.

It may sound simple to create a training set that includes a diverse set of examples, but it’s easy to overlook if you aren’t careful. You may need to go out of your way to find or create datasets with examples from a variety of demographics. At some point, we will also want to train our bot on data examples from real usage, rather than relying on scraped or manufactured datasets. But what do we do if even our real users don’t represent all the populations we’d like to include?

We can take a laissez-faire approach, allowing natural trends to guide development without editing the data at all. The benefit of this approach is that you can optimize performance to your general population of users. However, that may come at the expense of an underrepresented population that we don’t want to ignore. For example, if the majority of users interacting with a chatbot are under the age of 65, the bot will see very few questions about medical services that apply only to an over-65 population, such as osteoporosis screenings and fall prevention counseling. If bots are only trained on real interactions, with no additional guidance, it may not perform as well on questions about those services, which disadvantages older adults who need that information.

In order to combat this at my company, we create synthetic training questions or seek another data source for questions about osteoporosis screenings and fall prevention counseling. By strategically enforcing more distribution and representativeness in our training data, we allow our bot to learn a wider range of topics, without unfair preference for the interests of the majority user demographic.

Read the source article in VentureBeat.

Here Are 10 Free Must-Read Books for Machine Learning and Data Science

By Matthew Mayo, KDnuggets Summer, summer, summertime. Time to sit back and unwind. Or get your hands on some free machine learning and data science books and get your learn on. Check out this selection to get you started. It’s time for another collection of free machine learning and data science books to kick off […]

By Matthew Mayo, KDnuggets

Summer, summer, summertime. Time to sit back and unwind. Or get your hands on some free machine learning and data science books and get your learn on. Check out this selection to get you started.

1. Python Data Science Handbook
By Jake VanderPlas

The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it’s a fast-paced introduction to the Python language aimed at researchers and scientists.

  1. Neural Networks and Deep Learning

By Michael Nielsen

Neural Networks and Deep Learning is a free online book. The book will teach you about:

– Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data

– Deep learning, a powerful set of techniques for learning in neural networks

Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you many of the core concepts behind neural networks and deep learning.

  1. Machine Learning & Big Data

By Kareem Alkaseer

This is a work in progress, which I add to as time allows. The purpose behind it is to have a balance between theory and implementation for the software engineer to implement machine learning models comfortably without relying too much on libraries. Most of the time the concept behind a model or a technique is simple or intutivei but it gets lost in details or jargon. Also, most of the time existing libraries would solve the problem at hand but they are treated as black boxes and more often than not they have their own abstractions and architectures that hide the underlying concepts. This book’s attempt is to make the underlying concepts clear.

5. Statistical Learning with Sparsity: The Lasso and Generalizations
By Trevor Hastie, Robert Tibshirani, Martin Wainwright

During the past decade there has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. This book descibes the important ideas in these areas in a common conceptual framework.

  1. Statistical inference for data science

By Brian Caffo

This book is written as a companion book to the Statistical Inference Coursera class as part of the Data Science Specialization. However, if you do not take the class, the book mostly stands on its own. A useful component of the book is a series of YouTube videos that comprise the Coursera class.

The book is intended to be a low cost introduction to the important field of statistical inference. The intended audience are students who are numerically and computationally literate, who would like to put those skills to use in Data Science or Statistics. The book is offered for free as a series of markdown documents on github and in more convenient forms (epub, mobi) on LeanPub and retail outlets.

Read the complete source list at KDnuggets.com.