Cloud Security

Google Cloud Launches BigLake, a New Cross-Platform Data Storage Engine

Google Cloud Launches BigLake, a New Cross-Platform Data Storage Engine
Written by ga_dahmani
Google Cloud Launches BigLake, a New Cross-Platform Data Storage Engine

At its Cloud Data Summit, Google today announced the preview release of BigLake, a new data lake storage engine that makes it easier for companies to analyze the data in their data warehouses and data lakes.

The idea here, essentially, is to take Google’s experience with running and managing its BigQuery data warehouse and extend it to data lakes in Google Cloud Storage, combining the best of data lakes and warehouses into one. service that abstracts the underlying storage. formats and systems.

It’s worth noting that this data could be in BigQuery or live in AWS S3 and Azure. Gen2 data lake storage, as well. Through BigLake, developers will gain access to a consistent storage engine and the ability to query underlying data stores through a single system without moving or duplicating data.

Managing data across disparate data lakes and warehouses creates silos and increases risk and cost, especially when data needs to be moved,” said Gerrit Kazmaier, vice president and general manager of databases, data analytics and business intelligence at Google Cloud. ., he points out in today’s announcement. “BigLake enables companies to unify their data lakes and warehouses to analyze data without worrying about the underlying storage system or format, eliminating the need to duplicate or move data from one source and reducing costs and inefficiencies.”

Image credits: Google

Using policy tags, BigLake allows administrators to configure their security policies at the table, row, and column levels. This includes data stored in Google Cloud Storage, as well as the two supported third-party systems, where BigQuery Omni, Google’s multi-cloud analytics service, enables these security controls. Those security controls also ensure that only the right data flows into tools like Spark, Presto, Trino, and TensorFlow. The service also integrates with Google dataplex tool to provide additional data management capabilities.

Google notes that BigLake will provide fine-grained access controls and that its API will span Google Cloud, as well as open column-oriented file formats such as Apache. Parquet and open source processing engines like Apache Spark.

Image credits: Google

“The volume of valuable data that organizations have to manage and analyze is growing at an incredible rate,” explain Google Cloud Software Engineer Justin Levandoski and Product Manager Gaurav Saxena in today’s announcement. “This data is increasingly distributed across many locations, including data warehouses, data lakes, and NoSQL warehouses. As an organization’s data becomes more complex and proliferates across disparate data environments, silos arise, creating increased risk and cost, especially when that data needs to be moved. Our customers have made it clear: they need help.”

In addition to BigLake, Google also announced today that Wrench, your globally distributed SQL database, will soon get a new feature called “change streams”. With these, users can easily track any changes to a database in real time, be it inserts, updates, or deletes. “This ensures that customers always have access to the most up-to-date data, as they can easily replicate changes from Spanner to BigQuery for real-time analysis, trigger downstream application behavior via Pub/Sub, or store changes in Google Cloud Storage (GCS) for compliance.” Kazmayer explains.

Google Cloud also brought today Vertex AI Workbencha tool for managing the entire life cycle of a data science project, out of beta and into general availability, and released Connected Sheets for Looker, as well as the ability to access Looker data models in its tool DataStudio BI.

About the author


Leave a Comment