2018-07-26
与其它新兴技术一样,大数据分析一直笼罩着一层神秘的面纱。本文将解密大数据分析的八个谬误,帮助您推进分析策略。
毫无疑问,多年来,大数据分析的概念不断遭到误解。早期采用者在许多领域挣扎前行,最终导致高于预期的失败率和投资回报率低下。然而,过去的许多错误早已被克服,但是仍然有一些关于大数据分析概念和实施步骤的谬误。
尽管追踪记录(或观察)不尽如人意,但大数据仍然是一个大问题。IDC在2016年第三季度发布了一项预测,显示大数据和分析市场同比增长率达到两位数。如果这一点属实,那么关于大数据分析的许多观点肯定是错误的,对吧?
谬误之所以能够长久流传,是因为谬误中混杂了一小部分真实内容,让人分不清真假。复杂的技术也是这样,它们常常被夸大,导致最终的采用慢于预期。大数据就是这样一种技术,其它复杂技术包括软件定义的WAN(SD-WAN)、IT安全,甚至云计算。然而,如果这些技术适合企业,谬误最终会被打破,真理会浮出水面。
今天,我们将会介绍围绕着大数据和分析的八个谬误。当你翻阅幻灯片时,试着弄清楚真相在哪里被曲解导致形成了谬误。这是打破谬误回归真理的最佳方法。在大多数情况下,围绕大数据或分析的一些谬误源于早期采用者的错误判断。在其它情况下,谬误源于缺乏运行大数据项目所需的技能和工具的企业IT部门。最后,还有一些谬误源自关于大数据架构概念和组件的错误信息和错误沟通。
在大数据活动的早期,最大的谬误之一是:企业应该保存收集到的所有数据。对于这样做的人来说,他们面临着以最低的成本存储所有数据的任务。许多人寻求基于云的数据存储技术,如Amazon Glacier或Google Coldline Storage。虽然这些技术确实是用于灾难恢复的极好的低成本解决方案,但它们无法进行适当的数据分析。最终,人们发现大数据的真正好处在于:对获取的信息进行实时分析和报告。
当您与完全不了解大数据分析的人探讨时,您会发现许多IT领导认为大数据分析的成本太高昂。这可能是因为大数据首先是在最大的企业中流行起来的,例如Facebook、微软和沃尔玛,这导致许多人认为只有大型企业才能使用该技术。早期确实是这样的,但是现在,基于云的大数据和分析解决方案能够帮助中小型公司分析大数据,并根据需要增加规模,而且启动成本相对较低。
如果你认为已经没有机会使用大数据分析技术来获得竞争优势了,那我有一些消息给你。 大数据将长期成为企业IT的一部分。如果想绕过大数据,等待下一个大技术浪潮,那你要等很长时间了。事实是,数据分析的下一个创新只不过是许多已经实现的大数据基础的演变。话虽如此,由于过去几年出现的工具和平台的进步,你还是有时间迎头赶上的。在人工智能和自动化领域尤其如此,这可以大大减少大数据分析所花费的时间和精力。
在开始任何大数据项目之前,确定大致的目标是一个好主意,但没必要确定具体的问题。通常,分析可以帮助您回答问题或解决您从未考虑过的问题。如果您太专注于获得具体的答案,您可能会失去重要的洞察力,而这些可能会比您认为的更加有用。即使这些见解可能是奇怪的,但它们仍然非常有用。
2017年,围绕大数据项目的最大担忧之一是媒体经常报道的数据科学家短缺问题。这个消息导致许多人认为只有具备数据科学教育背景的人能够担任数据分析师。换句话说,您无法将负责编码或管理虚拟机的IT专业人员转变为分析专家。尽管数据分析的学习曲线很陡峭,但是有理由认为,任何有企业IT背景的人(也许是内向型)都比其他人更有优势进入数据分析领域。
正如一句谚语所说:“知之为知之,不知为不知”,数据分析领域也是这样。在许多情况下,企业领导认为他们熟悉企业业务和市场。因此,大数据分析无法给他们提供什么信息。这个问题主要在于打破常规思维模式。事实已经证明,数据驱动型组织能够获得关键的业务洞察力,从而获得真正的竞争优势。
分析平台都是一样的,不同之处在于您寻找什么。这些观点错的离谱,说明人们缺乏对大数据和分析领域的研究。证明这一点的最好方法是:在多个平台上使用相同的数据,并寻找相同的答案。 您可能会感到惊讶,结果将会非常不同。您可以使用这些信息找到适合您的数据和业务的平台。
企业数据库已经存在了很长时间。所以,当“数据湖泊”一词出现时,很多人认为这只是企业数据库(EDW)的流行行话。虽然数据湖泊听起来像在营销公司的会议室中制定的术语,但EDW和数据湖泊之间存在明显的差异:主要围绕结构化和非结构化数据存储方法。
在这个快节奏的企业IT世界中,能够区分事实与谬误是非常重要的。相信谬误会导致IT部门推迟采用能够真正推动企业发展的技术。这就是为什么我们社区致力于识别和打破所有技术(包括大数据)的谬误。
https://www.informationweek.com/big-data/debunking-8-big-data-and-analytics-myths/d/d-id/1329930
9/21/2017
07:00 AM
As with other emerging tech concepts, big data and analytics are haunted by myths. Here are eight such myths that you will want to dispel as you advance your analytics strategy.
There's little doubt that the concept of big data analytics has been dragged through the mud multiple times over the years. Early adopters struggled in many areas that ultimately led to higher than expected failure rates -- and ultimately -- a poor return on investment. Yet, many of the mistakes of the past have long since been overcome. What remains, however, are a number of myths surrounding concepts and implementation steps that some feel still reflect the truth.
Despite the less than stellar track record -- or perception -- big data remains a big deal. IDC released a forecast in the third quarter of 2016 that showed that the big data and analytics market hitting double digit year-on-year growth rates. If this is true, then many of those scary myths still floating around almost certainly must be wrong. Right?
The thing about the best and longest lasting myths, legends and lore is that there is always a nugget of truth that keeps the mistruth going. This is commonly the case with complex technologies that are often over hyped and ultimately become slower than expected to be adopted. Big data is one of those technologies, but it's not the only one. Other recent examples where negative myths have been formed around technology include software defined WANs (SD-WAN), IT security and even cloud computing. Yet, if the technology is ultimately the right fit for enterprise organizations, myths eventually are overcome and the truth is exposed.
Today, we're going to look at eight such myths that have come out of the big data and analytics movement. As you're flipping through the slides, try to figure out where the truth became skewed to the point where the fallacy was formed. This is the best way to tear down the myth and bring reality back into the picture. In most cases, a misconception surrounding some aspect of big data or analytics was due to an error in judgment made by a number of early adopters. In other situations, myths formed out of the enterprise IT department lacking the skills and tools required to run a big data project. Finally, a few fallacies came about based on simple misinformation and miscommunication regarding concepts and components of big data architectures.
Keep all your data; you might need it one day
One of the biggest myths to come about during the early days of the big data movement is that enterprise organizations should hold onto every scrap of data that could ever be collected. For those who went down that path, they were met with the task of figuring out where to store data at the lowest cost. Many sought out cloud-based data archiving technologies such as Amazon Glacier or Google Coldline Storage. While these technologies are indeed excellent low-cost solutions for data archiving for disaster recovery, it’s not the right place for data analytics. Ultimately, it was discovered that the true benefits of big data come in the real-time analysis and reporting of recently procured information.
Big data analytics are far too expensive
When you start discussing the topic of big data to those who aren’t fully informed, you often come away with the sense that many IT leaders feel that they can’t afford the cost of getting started. This likely came about because big data first became popular with the largest enterprise organizations. Story after story about big data being leveraged in companies such as Facebook, Microsoft and Wal-Mart led many to believe that this was a technology only attainable by the largest of organizations. While this may have in fact been true very early on, cloud-based big data and analytics solutions now allow companies to start small and scale their big data ambitions on an as-needed basis with relatively low start-up costs.
We’re already too far behind the big data curve
If you’re in the camp that thinks your window of opportunity to use big data and analysis reporting to gain a competitive advantage has long since closed, I’ve got some news for you. Big data is going to be part of enterprise IT for a long time. If you think that you’re going to pass on big data and wait for the next big technology wave to ride, you’re going to be waiting for a long time. The truth is, the next innovations in data analysis are going to be nothing more than evolutions of a big data foundation that many already have in place. That being said, there’s still plenty of time to catch up given the advancements in tools and platforms that have cropped up over the last few years. This is especially true in the fields of artificial intelligence and automation that can dramatically reduce the time and effort spent capitalizing on information gleaned from big data analysis.
You need to know what questions to ask prior to starting a big data project
While it’s always a good idea to start any big data project with a general idea of what you’re trying to achieve, it’s not necessary to know what exact questions you are looking to get answered. Often, analytics can help to answer questions or solve problems you never even considered. If you’re too focused on reporting on very specific answers, you may lose critical insight that may end up being more useful that you thought. Even though insights can be bizarre, they also remain very useful.
IT staff can't easily move into data analytics roles
One of the biggest concerns surrounding big data projects in 2017 is the shortage of data scientists often reported in the media. This news has led many to believe that data analysts are only good if they have a background rooted in data science. In other words, you can’t take some IT pro that’s coding or managing virtual machines and turn them into an analytics expert. While it is true that the learning curve for data analysis is steep, it stands to reason that anyone with a background in enterprise IT (and perhaps an introvert) has a leg up on others seeking to get into the data analytics game.
Analytics results will only confirm what you already know
The old saying “you don’t know what you don’t know” rings true in the world of data analysis. In many cases, business leaders feel they are the ones that intimately know their business and their market. Therefore, there’s very little that can be learned from big data analytics exercises. This is more of a problem with breaking the mold of conventional wisdom in business than a true myth regarding big data. Ultimately, it’s been proven many times over that data-driven organizations gain key business insight that has the potential to provide true competitive advantages.
All analytics platforms are the same
Analytics platforms are all the same. The difference is in what you’re looking for. These statements couldn’t be further from the truth and show a lack of research in the field of big data and analytics. The best way to prove this is to pilot several platforms using the same data and looking for the same answers. You’ll likely be surprised how different the results will be. Use this information to find the right fit for your data and business vertical.
Data lake is just a fancy name for data warehouse
Enterprise data warehouses have been around for a long time. So, when the term data lake began cropping up, many assumed that this was simply a buzzword used to spice up the concept of EDW's. While data lake does sound like a term concocted in the conference room of a marketing firm, there are distinct differences between EDW and data lakes -- largely revolving around structured vs. unstructured data storage methods.
Conclusion
It’s important to be able to separate fact from fiction in this fast-paced world of enterprise IT. Believing in myths can paralyze an IT department into holding off too long on technologies that truly can propel a business forward. That’s why it’s so critical that as a community, we identify and knock down myths in all technologies, including big data.
附件:
《Debunking 8 Big Data and Analytics Myths》--原文.pdf
《Debunking 8 Big Data and Analytics Myths》--译文.pdf

微信公众号