2018-06-29
数据中心自动化是不可避免的。本文将介绍如何着手实现这一点。
Hitachi Vantara公司高级产品和工程副总裁里奇罗 杰斯(Rich Rogers)设想了一个数据中心,在其中人工智能(AI)驱动的管理软件(部分或全部基于云)将无缝跨越单个或多个网站,监督和控制IT基础设施、设备和应用程序。计算、电力、存储、网络和冷却操作将动态灵活地实现最大效率、生产力和可用性。与此同时,人类操作员得以解放出来做他们最擅长的事情:规划新功能并进行创新。
罗杰斯说:“物联网和人工智能将使数据中心问题成为根源,并通过软件自动解决。” 数据中心管理员不必在夜间被唤醒来排除故障。“语音技术将使数据中心操作员能够从任何位置监控和管理数据中心,无论是杂货店、健身房还是客厅沙发。”他预测道。IT设备将自主部署和维护。“你只需购买新的计算节点和磁盘,机器人会将技术简化到适当的系统中。”罗杰斯解释说。
AI驱动的自动化的长期目标是:推动IT管理服务实现零宕机。IBM人工智能平台混合服务副总裁萨蒂什库马尔(Satheesh Kumar)表示:“随着时间的推移,我们预计传统的SLA模式——有效性高于99%——将毫无意义,因为系统会始终处于开启、兼容、安全和灵活的状态。”(译者注:SLA,Service-Level Agreement的缩写,意思是服务等级协议。是关于网络服务供应商和客户间的一份合同,其中定义了服务类型、服务质量和客户付款等术语。)
由于意外中断和延迟,目前数据中心基础设施的管理是高度反应性的。人工智能旨在解决这个问题。“随着基础设施变得越来越重要和复杂,这种资源密集型方法将会行不通。”惠普企业存储部门总经理米兰 塞蒂(Milan Shetti)表示,“人们已经不再接受在中断发生后才发现问题或花费资源来解决这些问题——这就是人工智能的机会。”
越来越多的智能传感器可用于接收来自各种数据中心单元的数据,将关键洞察力转化为机械、电力和环境条件。
“之后,这些数据用复杂的算法进行分析,以识别整个系统中的任何潜在问题或异常,并提前警告数据中心管理员。”卡耐基梅隆大学Tepper商学院商业技术副教授帕拉姆 维尔 辛格(Param Vir Singh)指出。
将AI集成到数据中心的操作中需要首先定义和实施一项自动化策略,然后选择初始关键用例。采用者可以将重点放在基于规则的系统上以对相关性/模式采取行动,或者遵循机器学习路径来生成预测,然后基于这些预测自动执行操作。
技术市场研究公司信息服务集团(ISG)的董事兼分析师斯坦顿 琼斯(Stanton Jones)表示:“无论哪种方式,人都是至关重要的组成部分,能够确保规则的准确性和机器学习模型充分发挥作用。”
耗时且重复性的IT任务是AI自动化计划的理想起点。IT自动化和编排平台开发商Ayehu的联合创始人兼首席执行官盖比 尼兹丽(Gabby Nizri)解释说:“服务器重启、磁盘空间不足修复和密码重置都是从AI中受益的很好的例子,并且可以在数据中心轻松实现自动化。另外,请考虑在数据中心应用AI自动化,以确保IT合规性并统一所有业务服务的策略管理。”
当需要快速完成某些工作时,自动化最为重要。“企业应该确定其关键的故障排除流程,因为这些流程非常适合实现自动化。”瞻博网络工程副总裁苏米特 辛格(Sumeet Singh)表示,“一次专注一个流程也很重要,以便不断地将自动化结果构建到数据中心流程中。”
位于加利福尼亚州阿塔斯卡德罗市的数据存储咨询公司Coughlin Associates的总裁兼IEEE高级会员汤姆 库格林(Tom Coughlin)建议,考虑全面部署之前先在沙箱环境中试验AI服务。“确保有一个可扩展的计划,以某些功能开始并根据你的时间和预算添加其他功能。确保就如何在数据中心最好地实施AI询问增值经销商(VAR)和顾问。”他补充说。
认识到AI的益处往往会随着时间的推移而改善也很重要。戴尔EMC首席技术官约翰 罗伊斯(John Roese)解释说:“为AI系统提供的数据越多,它提供的建议就越好而且能够促进更好的结果。部署AI的额外好处是帮助技术人员提高技能,并在学习和适应新技术的同时提高工作效率。”
无论采取何种方法,部署AI驱动的服务时都要小心谨慎。统一IT管理软件开发商Ivanti的系统工程师马塞尔 萧(Marcel Shaw)警告说:“在你确信AI每次都能做出正确的决定之前,不要把决策权交给它。”让AI技术有时间学习并逐渐整合到环境中是非常重要的。“在刚开始应用AI时,让它提供建议而非自己采取行动。一次错误的决定就可能会导致系统瘫痪,从而产生意想不到的后果,比如收入损失或生产力降低等。”他指出。
智能传感器收集的数据质量也很重要。“如果你没有好的数据,就不会有好的AI。”卡耐基梅隆大学教授辛格指出。请记住,AI主要是“向后看的”。“它查看过去的数据,并从中学习模式来预测某些事情。因此,对于更新的、未曾见过的异常,它很容易出现错误。”他警告说。
另一个重要缺陷是缺乏能够有效管理AI系统并将其洞察力转化为最大价值的IT团队。 “数据中心的AI自动化为企业提供了强大的功能和洞察力,但如果没有一个团队来管理AI系统并利用这些洞察力,企业就不太可能充分利用AI。”罗伊斯指出。
琼斯说:“企业不应低估有效实施AI系统所需的时间、过程知识和数据。另外,企业应该意识到这是一项非常新的技术,现在着手意味着要走在这场转型的前沿。”
采用者还必须区分供应商炒作,准确判断AI的潜在益处。塞蒂说:“在经过一段时间的成功之后,你才能信任AI系统,让它做出决定并采取行动。你不会让一辆才自动驾驶了15英里的汽车自行驾驶,也应以同样的方式对待AI。”
2/21/2018
02:00 PM
Data center automation is inevitable. Here's how to do it right from the beginning.
Rich Rogers, a senior vice president of product and engineering at Hitachi Vantara, envisions a data center in which AI-driven management software (some or all of it cloud-based) will monitor and control IT and facilities infrastructure, as well as applications, seamlessly and completely across single or multiple sites. Compute, power, storage, networking and cooling operations will flex dynamically to achieve maximum efficiency, productivity and availability. Human operators, meanwhile, will be free to do what they do best: plan new capabilities and innovate improvements.
"IoT and AI will enable data center issues to be root-caused and resolved automatically by software," Rogers said. Data center administrators will no longer be woken-up at night to troubleshoot outages. "Voice technologies will enable data center operators to monitor and manage their data centers from any location, be [they] at the grocery store, gym or living room couch," he predicted. IT Infrastructure gear will be deployed and maintained autonomously. "You simply stock new compute nodes and disk drives and robotics [will] streamline the technology to the appropriate systems," Rogers explained.
AI-driven automation's long-term goal is to drive IT managed services toward zero downtime. "Over time we expect the traditional SLA model—99.xx availability, etc.—will have no meaning as the system is always on, compliant, secure, agile and flexible," advised Satheesh Kumar, IBM's vice president of hybrid services, AI platform.
Data center infrastructure management is currently highly reactive due to the unexpected arrival of disruptions and delays. AI aims to fix this. "As infrastructure becomes increasingly vital and complex, this resource-intensive approach won’t work," observed Milan Shetti, general manager of Hewlett Packard Enterprise's storage division. "It’s no longer acceptable to find out about a disruption after it has occurred or spend the resources to resolve them—that’s the opportunity for AI."
A rapidly growing number of smart sensors are becoming available to receive data from various data center elements, relaying critical insights into mechanical, electrical and environmental conditions.
"This data can be then used by sophisticated algorithms to analyze any potential problems or anomalies in the whole system, and warn data center managers well in advance," noted Param Vir Singh, associate professor of business technologies at Carnegie Mellon University's Tepper School of Business.
Getting Started
Integrating AI into data center operations begins with defining and implementing an automation strategy, then picking an initial key use case. Adopters can either focus on rules-based systems to take action on correlations/patterns or follow a machine learning path to develop predictions, and then automate actions based on those predictions.
"Either way, humans-in-the-loop will be critical to ensure rules are accurate and machine learning models are performing adequately," said Stanton Jones, a director and analyst at technology market research firm Information Services Group (ISG).
Time consuming and repetitive IT tasks are ideal starting points for an AI automation initiative. "Server restarts, low disk space remediation [and] password resets are great examples of tasks that benefit from AI and can easily be automated in a data center," explained Gabby Nizri, co-founder and CEO of Ayehu, an IT automation and orchestration platform developer. "Additionally, look at using AI automation in the data center to ensure IT compliance and unify policy management across all business services," he suggested.
Automation matters most when there's a need to do something fast. "Organizations ought to identify their key troubleshooting workflows, as those will lend themselves quite naturally to automated outcome," advised Sumeet Singh, vice president of engineering at Juniper Networks. "It’s also important to focus on targeting one workflow at a time, diligently building automated outcomes into data center processes."
Experiment with AI services in sandbox environments before considering a full deployment, suggested Tom Coughlin, IEEE Senior Member and president of Coughlin Associates, a data storage consulting firm located in Atascadero, Calif. "Be sure to have a scalable plan that starts with some features and adds other capabilities as your time and budget allows," he added. "Make sure that you ask good questions of VARs and consultants on how to best implement AI in data centers."
It is also important to recognize that AI benefits tend to improve over time. "The more data fed into AI systems, the better it can make recommendations and facilitate better outcomes," explained Dell EMC CTO John Roese. "An additional benefit in deploying AI is helping technical staff improve their skills and become more productive as they learn and adapt to the new technology."
Avoiding Pitfalls
Regardless of the approach taken, it's always a good idea to exercise caution when deploying an AI-driven service. "Don’t hand over decision-making power to AI until you are comfortable that it is making the right decision every time," warned Marcel Shaw,a federal systems engineer with Ivanti, a unified IT management software developer. It's important to give AI technology the time to learn and grow into its environment. "In starting out with AI, have it provide recommendations instead of letting it take action by itself," he suggested. "It only takes a single wrong decision to bring your system down, which could result in significant unexpected costs from lost revenue or reduced productivity."
The quality of the data collected by smart sensors also matters. "If you don’t have good data, you can’t have good AI," Carnegie Mellon's Singh observed. Remember, too, that AI is primarily backward-looking. "It looks at the past data and learns patterns in the data to predict something," he cautioned. "Therefore, it is susceptible to newer, unseen anomalies," he cautioned.
Another major pitfall is lacking an IT team that can effectively manage the AI system and interpret insights to their maximum value. "AI automation in data centers gives organizations powerful capabilities and insights, but without a team to manage the system and leverage those insights, organizations will be less likely to take full advantage of AI," Roese advised.
"Organizations shouldn't underestimate the time, process knowledge and data required to effectively implement AI systems," Jones said. "Also, organizations should be aware that this is very new technology and getting started now means being on the leading edge of this transformation."
Adopters must also look beyond vendor hype to accurately judge AI's potential benefits. "An AI system should have a history of proven success before it can be trusted to make decisions and take actions," Shetti said. "You’re not going to let a car with only 15 self-driven miles take control, and you should be wary of AI for infrastructure in the same manner."
附件:
《How to Begin Integrating AI into Data Center Operations》--原文.pdf
《How to Begin Integrating AI into Data Center Operations》--译文.pdf

微信公众号