Big Data


BIG DATA
                               




Abstract:


Big data is often characterized by 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed. Although big data doesn't equate to any specific volume of data, the term is often used to describe terabytes, petabytes and even exabytes of data captured over time.


BREAKING DOWN OF 3V's IN BIGDATA


Such voluminous data can come from myriad different sources, such as business sales records, the collected results of scientific experiments or real-time sensors used in the Internet of things. Data may be raw or preprocessed using separate software tools before analytics are applied.Data may also exist in a wide variety of file types, including structured data, such as SQL database stores;  unstructured data, such as document files; or streaming data from sensors. Further, big data may involve multiple, simultaneous data sources, which may not otherwise be integrated. For example, a big data analytics project may attempt to gauge a product's success and future sales by correlating past sales data, return data and online buyer review data for that product.

Big data infrastructure demands


The need for big data velocity imposes unique demands on the underlying compute infrastructure. The computing power required to quickly process huge volumes and varieties of data can overwhelm a single server or server cluster. Organizations must apply adequate compute power to big data tasks to achieve the desired velocity. This can potentially demand hundreds or thousands of servers that can distribute the work and operate collaboratively.
Achieving such velocity in a cost-effective manner is also a headache. Many enterprise leaders are reticent to invest in an extensive server and storage infrastructure that might only be used occasionally to complete big data tasks. As a result, public cloud computing has emerged as a primary vehicle for hosting big data analytics projects. A public cloud provider can store petabytes of data and scale up thousands of servers just long enough to accomplish the big data project. The business only pays for the storage and compute time actually used, and the cloud instances can be turned off until they're needed again.

 Why Is Big Data Important?

The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:
·      Determining root causes of failures, issues and defects in near-real time.
·      Generating coupons at the point of sale based on the customer’s buying habits.
·      Recalculating entire risk portfolios in minutes.
·      Detecting fraudulent behavior before it affects your organization.

Streaming data

This category includes data that reaches your IT systems from a web of connected devices. You can analyze this data as it arrives and make decisions on what data to keep, what not to keep and what requires further analysis.


Social media data

The data on social interactions is an increasingly attractive set of information, particularly for marketing, sales and support functions. It's often in unstructured or semi structured forms, so it poses a unique challenge when it comes to consumption and analysis.
Publicly available sources

Massive amounts of data are available through open data sources like the US government’s data.gov, the CIA World Facebook or the European Union Open Data Portal.

Big Data as business engine of opportunity


So where do you start? Think of big data as an engine. To boost performance, it’s a matter of assembling the right components in a seamless, stable and sustainable way. Those components include:
·      Data Sources: operational and functional systems, machine logs and sensors, Web and social and many other sources.
·      Data Platforms, Warehouses and Discovery Platforms: that enable the capture and management of data, and then – critically – its conversion into customer insights and, ultimately, action. 
·      Big Data Analytics Tools and Apps: the “front end” used by executives, analysts, managers and others to access customer insights, models scenarios and otherwise do their jobs and manage the business.
At this level, it’s about harnessing and exploiting the full horsepower of big data assets to actually create business value. Making it all work together requires a strategic big data design and thoughtful big data architecture that not only examines current data streams and repositories, but also accounts for specific business objectives and longer-term market trends. In other words, there is no one single template to making big data work. We’re not talking about COTS here.


CATAGORIES OF 'BIG DATA'

Big data' could be found in three forms:
1.    Structured
2.    Unstructured
3.    Semi-structured


Structured


Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data. Over the period of time, talent in computer science have achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, now days, we are foreseeing issues when size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabyte.
Examples of Structured Data
An 'Employee' table in a database is an example of Structured Data
Employee_ID 
Employee_Name 
Gender 
Department 
Salary_In_lacs
2365 
Rajesh Kulkarni 
Male 
Finance
650000
3398 
Pratibha Joshi 
Female 
Admin 
650000
7465 
Shushil Roy 
Male 
Admin 
500000
7500 
Shubhojit Das 
Male 
Finance 
500000
7699 
Priya Sane 
Female 
Finance 
550000

Unstructured


Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, Unstructured data poses multiple challenges in terms of its processing for deriving value out of it. Typical example of unstructured data is, a heterogeneous data source containing a combination of simple text files, images, videos etc. Now a day organizations have wealth of data available with them but unfortunately they don't know how to derive value out of it since this data is in its raw form or unstructured format.
 
 Semi-structured 

Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in XML file.
Examples of Semi-structured Data
Personal data stored in a XML file-
<Rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<Rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<Rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<Rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<Rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>

Characteristics of 'Big Data'

(i)Volume – The name 'Big Data' itself is related to a size which is enormous. Size of data plays very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon volume of data. Hence, 'Volume' is one characteristic which needs to be considered while dealing with 'Big Data'.
(ii)Variety – The next aspect of 'Big Data' is its variety.
Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Now days, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. is also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.
(iii)Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.
(iv)Variability – This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

Benefits of Big Data Processing


Ability to process 'Big Data' brings in multiple benefits, such as-
• Businesses can utilize outside intelligence while taking decisions
Access to social data from search engines and sites like facebook, twitter are enabling organizations to fine tune their business strategies.
• Improved customer service
Traditional customer feedback systems are getting replaced by new systems designed with 'Big Data' technologies. In these new systems, Big Data and natural language processing technologies are being used to read and evaluate consumer responses.
• Early identification of risk to the product/services, if any
• Better operational efficiency
'Big Data' technologies can be used for creating staging area or landing zone for new data before identifying what data should be moved to the data warehouse. In addition, such integration of 'Big Data' technologies and data warehouse helps organization to offload infrequently accessed data.
Which programming language is best for big data?


Let's go through the important ones.
·      Python. Python is at the top of all other languages and is the most popular language used by data scientists.  
·      R. R has been kicking around since 1997 as a free alternative to pricey statistical software, such as Matlab or SAS. 
·      Java. 
·      Scala.


Which programming language to learn for big data?


Every year Kd Nuggets conducts a poll on “What programming/statistics languages are used for data science work”. The graph below shows the survey result for the year 2014. The popular tools or programming languages mentioned in the survey are R, Python, SAS, MATLAB, SPSS, My SQL and Java.


APPLICATIONS OF BIGDATA


Banking


With large amounts of information streaming in from countless sources, banks are faced with finding new and innovative ways to manage big data. While it’s important to understand customers and boost their satisfaction, it’s equally important to minimize risk and fraud while maintaining regulatory compliance. Big data brings big insights, but it also requires financial institutions to stay one step ahead of the game with advanced analytics.

Education

Educators armed with data-driven insight can make a significant impact on school systems, students and curriculum's. By analyzing big data, they can identify at-risk students, make sure students are making adequate progress, and can implement a better system for evaluation and support of teachers and principals.

Government

When government agencies are able to harness and apply analytics to their big data, they gain significant ground when it comes to managing utilities, running agencies, dealing with traffic congestion or preventing crime. But while there are many advantages to big data, governments must also address issues of transparency and privacy.

Healthcare
Patient records. Treatment plans. Prescription information. When it comes to health care, everything needs to be done quickly, accurately – and, in some cases, with enough transparency to satisfy stringent industry regulations. When big data is managed effectively, health care providers can uncover hidden insights that improve patient care.

Manufacturing

Armed with insight that big data can provide, manufacturers can boost quality and output while minimizing waste – processes that are key in today’s highly competitive market. More and more manufacturers are working in an analytics-based culture, which means they can solve problems faster and make more agile business decisions.

Retail

Customer relationship building is critical to the retail industry – and the best way to manage that is to manage big data. Retailers need to know the best way to market to customers, the most effective way to handle transactions, and the most strategic way to bring back lapsed business. Big data remains at the heart of all those things.

Comments

Post a Comment

POPULAR POSTS

POPULAR POSTS