The Data-Informed Institution

Issue link:

Contents of this Issue


Page 9 of 20

Content elements: › How education is using data for digital transformation › The mission and business value of data › Data, adaptability, and agility › Agility for data - 6 steps › How can we use data to bring adaptability to our institution? › In closing › About the author 2. Store it Once we acquire the data, we must store it to make it available for analysis. Traditionally, we stored data in a structured format based on our expectations about how it would be used operationally. For example, we might have a field in a database for "course credits" and another field for "class size limit." We would collect the data to fill these fields and file them away by slotting them into the appropriate blanks in the database, knowing that we could always use these values operationally in presenting course offerings. By forcing the data into such a mold, we made it useful for transactions, but we might have lost information that could have been useful for analysis. This was the relational database model. The past few decades have been dominated by the use of these relational databases, which are very well suited to efficient processing of old-world volumes of transactional data in ways that are known in advance ("multiply grade quantity by course credit"). But when you are working with non-transactional data or operating at tremendous internet scales of transactions or managing data that does not slot easily into pre-defined "data fields," there are now much better alternatives, purpose-designed for the cloud. For example, Amazon Timestream is a database designed specifically to manage time-series data (like the data produced over time by an industrial sensor or by tracking market activity over time); Amazon Quantum Ledger Database is intended for the type of data used in blockchain (data whose history must be verifiable, using techniques like cryptography); and Amazon Neptune is designed for representing complex connections and relationships, like social networks. Enterprises are no longer limited to what they can force-fit into a relational model. Better still (for agility), data that will be used for yet-undetermined analysis can be stored in a flexible repository called a data lake, where each piece of data is stored simply in the form in which it was received. The power of the data lake lies in the tools that can be used to analyze it: tools that let you combine heterogeneous information, mixing together structured and unstructured data, data from different organizational silos, and data in large quantities. Today's tools can apply ML algorithms and statistical analyses, and they can work with natural language text, video, and speech. In the case of an acquisition (in EdTech, for example), we can quickly set up a way to pour data from a newly acquired organization into the lake and thereby gain transparency into its operations, and we can integrate its data with our own. The magic that makes this all possible is: (1) the low cost of storage, (2) the availability of tools that work with loosely structured, heterogeneous data, and (3) the availability of services that let you push data into the data lake at high bandwidth and asynchronously ( just send the data toward the data lake as you receive it, and it will get there as quickly as it can, no need to wait—sort of like an email). In other words, the data lake meets the enterprise need for storing data before it knows all the ways it will be used. We can pour data into the lake from different functional silos and analyze it all together. 9

Articles in this issue

view archives of NAMER All PDFs - The Data-Informed Institution