Data Vault: Learn Business Vault Secrets


Hello Data Vaulters! Welcome to another in our series of
Data Vault video presentations. This time we want to look
at the Business Vault. Business Vaults are an extremely powerful feature of the Data Vault method but it isn’t given much publicity. We’ve seen many confused teams who’ve lost time and had to unpick mistakes. It’s a pity but for a bit of guidance those mistakes could have been avoided. Before we get started, who are we? We’re Datavault UK – the UK’s leading Data Vault consultancy. Responsible for the UK Data Vault User Group where those who are interested in Data Vault can meet, to
learn and share their experiences. And for data engineers, we’ve written and
support a package you can use with the free DBT tool, to build your own Data
Vault solution. This diagram shows the various layers mentioned in the Data Vault method. A commonly mentioned component is the Business Vault. This is the place where we store the results of business rule calculations and other
types of derived data as well. Dan defines the Business Vault as follows. I think the key points are: that it contains wholly derivable data and that it’s calculated after the Raw Data Vault and before loading the Information
Marts. So let’s start by busting the first misunderstanding. The Business Vault isn’t really a separate layer. It’s not a separate schema. It’s held inside the Raw Vault. Yes – the Business Vault has new tables
but they’re overlaid on the Raw Vault structure. Mart’s then feed from both the Raw Data Vault and the Business Vault tables. What types of data can the Business Vault hold? We might need to pre-calculate data for efficiency reasons, so it can be consumed by a downstream dashboard. We might need to calculate helper tables that drive better performance. We might calculate new values such as ratios. You might be using the results of data
science for our business. And finally we might be checking data quality and want to calculate quality measures. Let’s look at the first item on the list. Some processing might be needed to get data out of the Vault. Your users might not want the full granularity of data held in the Raw Vault. They may want aggregated, filtered or masked data in their reporting Marts. The point here is the data isn’t transformed. It’s simply added up, grouped, masked or otherwise processed before it’s consumed. There are three possible places to build consumption rules. You can run them in the Raw Vault, creating Business Vault tables. You can
run them on the way out to the Mart or you can run them in the end-user BI tools cache. If processing is intensive and the results used by more than one Mart – you’re best running the rule inside the Vault to populate Business Vault tables. If processed data is used in one Mart only. You can run those rules as you
populate the Mart. And finally, you might consider implementing local processing
rules in the end user BI tool if the data is only used in one display. Now things to consider when deciding where to code the presentation rules are –
security – it’s better to keep data in one place in the Vault and only let out
what’s needed. Performance and cost – the Vault is generally cheaper and faster. What your BI tool can and can’t do. It may have limits on data volumes or a
number of feeds per day. What your users actually need. Do they need aggregate data? It’s usually enough for them. And who needs access to what. And what
overlaps there are between groups of end-users. Next on the list is helper tables.
These are the Data Vault Pit and Bridge tables. These are really powerful
techniques that support virtual reporting Mart’s and dimension and fact
tables. You could spend a while on these but we don’t have time. So we’ll
introduce them here you can check Dan’s book for some more information. I’m
sure we’ll cover them off in a future video. So what does a Pit or point in time
table do? Say we have a Satellite, customer details from our CRM system.
This table has an effective from date. There’s no start or end dates here. So to find which Satellite rows are
valid for a given date. We have to find the record for each customer primary key
as the greatest effective from date before the reference date. This can be
quite an expensive calculation. However SQL window functions help to
reduce that load. One Satellite table may be OK for that sort of processing. What if you’ve got two, three or more Satellites off the same hub. Perhaps, one Satellite is fed throughout the day another one gets a weekly update and
others are fed daily. Which values are valid for given date? There’s quite a calculation – I don’t want to repeat that each time that we query the data. So a Pit table holds a pointer to the record in each of the satellites that’s valid
for each comes customer for each day. So for example, and this is common in many businesses, our users might want to process end of day data only. The Pit table then is calculated after loading the Raw Vault as a Business Vault
process and we find the end of day records for each customer in each
satellite and insert references to the Pit table. When we want to find the
relevant satellite records we do an equi-join with the Pit table, which is
much faster than looking for the greatest end date or greatest effective
date. The table, the Pit table, is quite long but thin, and we can trim that table to
only hold a few months of reference data. But now we can look at Bridge tables.
It’s common to have queries that navigate Hubs and Links to fetch data
from Satellites across the data model. Here we have an example, we’ve got three
Hubs joined by two Links but the real model could join many, many, more and possibly radiate out across multiple chains as well. The SQL to query that is quite repetitive.
But you can make mistakes if you’re writing that – if you’re not concentrating.
And different grains can also cause the query to grow and so to have
performance problems. To bridge tables, they pre-calculate the navigation for each reference date, just like the Pit tables do for Satellites. They’re the great tools to deliver aggregates since they can also store the sum of values as extra
attributes to the table. If you take Pit and Bridge tables together the secret is, and if you look closely at them, you’ll see that the Pit table is a
star schema dimension and the Bridge is quite close to a fact table. Pit and
Bridges then pre-calculate the work involved to load a Mart – and you can
actually build a Mart to view on these helper tables to give you extra agility. Business rules calculate new data. So that rule could work with a table, for example ratio of two columns. Or it can work across tables – say, we want to select a customer name from two
tables. We want it from this table here unless it’s missing, in which case we
look it up in that table over there – an integration rule. Business rule results are stored into business Satellites. And these look like other Satellites, except the source column is recorded as a business rule. Business rule Satellites hang off the same hubs or links that the raw satellites use. So here’s a common use case for a business rule satellite is to select the best view of records across a set of source systems. So here – if sources disagree, which one should we prefer for the date of birth or name or address and so on. We could code that decision process as a rule, run it and populate a clean business rule Satellite after each raw load to store those results. Another feed for the Business Vault is
the results of data science. So let’s say a data scientist has run some experiments and they found something interesting and useful for the business. So they built a model and deployed it into production. The model takes some data from a Data Lake and from our Data Vault and it produces some new data. The model is just really another source system as far as the Data Vault is concerned. So we feed the results into a staging layer and load them back into the Vault as normal. Finally you might want to measure data quality. One way of doing this is to write views on the data to expose quality problems – you could, for example, expose orphaned Satellites, malformed Hub keys, or missing dates, or out of range data and so on. So if the data is OK, then the views are empty. If not, we can see the offending records in the view. These views are almost a form of Satellite which we can attach to a Hub or Link record giving the rule its context. So how can we implement our business rules? We can calculate our business Satellites and populate them directly from a business rule, which is an option. Perhaps there’s a better way of doing this. Let’s say we have our regular Data Vault load, as illustrated. Stage data through ETL into the Raw Vault. We then apply business rules to that raw data. If we treat our rules engine as another source system. We can feed that output back to the staging layer. And then load the data just as any source data. As we’ve already built robust loading utilities, we can use them to load that
business for all data properly. Well, that’s a quick overview of the Business Vault. In summary – don’t be confused – the Business Vault is just a part of the Vault not a separate layer. Business rules can create new Satellites and sometimes they may create a Link or Hub too. There are different types of rule – summaries, derivations, helper tables, data science output and quality checks. And we can drop and recalculate business rules at any time. If you need to get hold of more detailed information, you can download User Group presentations from our UK User Group website. We also offer Data Vault and Information Governance rated blogs and white papers from our company site. If you’d like to experiment with the Data Vault system We have developed a package for the free to use ‘dbt’ tool. It will generate SQL to load the Vault from your metadata. Check it out – again on our website. Thanks for listening and
hope to see you again!

American Eagle: Building a multi-terabyte marketing data warehouse



We've got 20 million customers, we sell
60 thousand different items every year. You can imagine the interactions if you
start multiplying all those numbers together. That's a big number. We do run a big business
and we're proud of that. BigQuery makes it possible to reason
over all of that data Taking your vision of setting up the
entire enterprise marketing ecosystem on top of Google Cloud. That kind of helped us think
outside the box to leverage, you know, technologies like BigQuery, Dataflow, Dataproc, Cloud ML, to derive those artificial
intelligence and machine learning to create those real-time sales dashboards that your
organization was looking forward to. that's what we are actually looking for, to kind of, create those type of
disruptive solutions in the market and help the wider community. combining those digital capabilities from the core marketing side backed up again with all this
powerful database technology powerful AI and
machine learning technologies so that we can have this intimate
understanding of our customers, and things that they expect from us as brands. I think it's incumbent
upon us as brands that we talk to people as individuals and that we understand people as individuals. I think one thing I would like to highlight though is in six months what we have accomplished. Yeah. bringing in data from all these
different systems, And then having a marketing data
warehouse up and running and actually something business useful, some business value coming out of it. Can only be possible through the stack that Google believes in. Which is fully managed services. Yep. And we could be productive from day one, from hour zero, Yeah, I totally agree. And, through our personalization program in particular, we estimate about a 4X conversion increase, based on the work that we've done on
Google Cloud. Based on work that we've done with Merkel we've seen massive increases in customer lifetime value, which we're now measuring because of all our customer data being on
Google Cloud by the way I mean, If you think about how Google runs search, ingesting the entire web, and then giving us the capability
to search anything. leveraging those services to
power your own organization. I think that's the power
Google brings to the table

What is Business Analytics?| Career Growth in Business Analytics| Introduction to Business Analytics



Business Analytics in essence is talking about how do you use data how do you use IT and how would you use conceptual layer in business logic to understand how the problem would be resolved right so that in a nutshell would say that this is Business Analytics now you can refer to this as Business Analytics if you make it a little bit longer with some experience nudged into it it would be called as data science right so that in essence would be all about business analytics so I wouldn't restrict this field of study to a specific domain not that an engineer should only do it not that a BSC guy should only do it not that somebody who has come from a Beacom background or commerce background should do it in fact it can be done by anybody I have had students who have come from music from the music industry saying that they like creativity right so if you say that there is creativity in music the same exists in data patterns also right so it doesn't restrict us so anybody can learn it the only thing is that do you have the interest in you do you want to use it for your domain right so the application is varied it can be applied in pharma it can be applied in manufacturing consulting any domain you talk about it it is applicable the one important attribute a person should have is his interest in learning right so you should be somebody who is ready to learn so I wouldn't say although analytics is a complicated subject no doubt about that but I wouldn't say that you should restrict yourself by saying that I don't know IT i don't know statistics I don't understand anything about business so I shouldn't be stepping into this no I wouldn't say that is what stops you what stops you is whether you are ready to learn or not ready to learn right you should be able to adapt you should be ready to let learn you should be able to be agile and understand it right so that is only thing which is required in a student to learn analytic the core structure is with an assumption that a person doesn't know IT it doesn't understand statistics and has never dealt with business case study situations right that's the assumption right eventually people come with one strong background like somebody might be from an IT field is another person might be from a stats field somebody might just be into sales and marketing and he doesn't know anything about statistics or IT right but we can group this person up to a limit which is expected from the course right it's an advantage if you know something but that doesn't stop you from becoming a resource in business analytics that's how you would see it right so nothing is stopping you there is no barrier you just have an advantage you in fact it makes more sense for a student to learn it the reason being that we know that this subject this notion of analytics is obviously coming from the West they're in essence 80 to 90 percent of the work is done using analytics see over here do our jobs by our gut feeling right now corporates and vice presidents of major companies they have started thinking that yes we want to incorporate analytics in every decision we make a person who is just graduating may be doing his engineering college course or maybe he was doing his BAC course or be a become course or maybe even an arts course it makes more sense for him to learn this for a fresher it means that he is getting an edge or existing candidates within his college premises because he knows analytics and others don't know and frankly speaking today most of the engineering colleges and all these guys have started rooming their candidates on some aspect of analytics but the only thing is that it's restricted because of the curriculum which they can be exposed to it's more more of theory and less of practical typical data analysts in the US culture would typically mean a data scientist a short version of a data scientist whose job would be to read patterns into data nevertheless it doesn't mean that he wouldn't be working with data he would start from scratch from extracting data cleaning the data reading patterns out of that making sense out of that pattern and trying to put up the case in front of this know if I have to look at a data scientist specific job description in the Indian market per se it would talk about right from creating a requirement for analytics right because today the market doesn't see a requirement for analytics so you might have a company which is into retail which doesn't think about doing campaign based on the customers buying pattern but just as a campaign which is blanket I would want to customize this according to the requirement of the customer so it's a win-win situation for the company as well as for the customer in this case analytics would help the data scientist can be of great help to the company to reduce the cost of marketing increase the efficiency of marketing and this applies in every domain marketing sales HR right so going forward we can see an application wherein and data scientists would be working along with the HR team to actually analyze which set of employees should be retained for a longer period of time right so this would be some typical things which a data scientist would be doing for the company okay now if you look at a freshers perspective right first if she does some course which is about data science or about business analytics it gives him an edge to the companies when they are recruiting for a marketing professional also would expect that the candidate should know about analytics right today if a company which is in retail is looking at customer analytics they would obviously expect a guy who knows about marketing sales plus analytics they don't want a person who just knows one domain right so it's imperative that you will have to add up all the skills so do you see any cricketer today who can say that I can only Bowl No today we expect that a player should know bowling batting fielding everything which can be done on the field so that is a data scientist he can do excellent presentations he can talk in front of a big crowd he can work with codes he can work with data he understands database it's not that he will manage the database but he is in a situation to guide a database administrator also as to how the database should be for him to work in his favor so that's about the skill set level now from the freshers perspective obviously the first job would not be a very high-paying job but it will be reasonable somewhere around 4 lakhs to file lakhs that that's what you can expect if he is not from a very a great college right obviously if you are from an IIT or something like that or from an ion it gives you an edge it's because of the college brand now initially you will have to work with a lot of data so you will have to work with data you will have to extract data do all the preparation work on the data and present your report later on as you accumulate more knowledge you might get some more additional knowledge about how databases work so today we are talking only about Oracle data scientists might talk about no SQL Hadoop and all those things so he will enrich himself she will be an asset to the company and that is what a data scientist is about and that is what separates him and that is where he creates value for himself I wouldn't quote it to be the next big thing to happen because the next blip or the boom usually is followed by a drop right so this is not going to be followed by a drop this is there to sustain itself for a longer period of time the names might change we might see some more innovation right we are adapting right and we will be adapting more right slowly as we go forward analytics will be a common thing and obviously if it is becoming common thing it's not just a boom it's therefore staying for a longer period of time I would recommend students to take up a course like this irrespective of whether they see a career for themselves in an analytics or not the reason being that going forward no matter what domain you are working in analytics will come in play it will cross paths with you right and at that point in time if you don't know about it it's a problem someone should come to a mess pro school because it has a holistic approach there is a lot of content depth a lot of research has been done before the course has been launched it is followed by a strong examination path so which make sure that the student or the participant has thoroughly understood the subject so we have the NSE certification which comes with this so when you clear that it's an indication that yes you are ready for the industry along with that when you are taking this course but I am is pro school you are doing a practical mix of application along with the theoretical content which is taught in the class right and this gives you an edge over others today when I see a critical combination of something which is creative and analytical is nothing but a data scientist so here we are