Blogs & Resources

Data security blindspots for your data cloud and lakehouse

Supreeth Rao

Cloud data warehouses provide a unified platform combining data lakes with big data processing, machine learning, and analytics capabilities. While these platforms have robust native security features, it is still essential to be aware of potential data security blindspots and follow best practices.

Some of these blindspots include:

  1. Inadequate Data Classification: Security teams need to be aware of where the most sensitive data exists. Classifying data according to its sensitivity and applying appropriate access controls based on that classification is the best way to gain situational security awareness and protection. In many enterprises and industries, custom taxonomies must be well understood. Good security starts with good visibility.
  2. Issues beyond authentication and authorization: Implementing strong authentication methods like Single Sign-On (SSO) and Multi-Factor Authentication (MFA) for any cloud platform should be table stakes by now. Taking access a step further, organizations should always enforce the principle of least privilege, ensuring that users have access ONLY to the resources they need. This simple step can help to avoid data abuse. 
  3. Insecure Data Sharing: Data sharing is a significant aspect of the new data economy. Convenient data sharing is a significant part of the business value of the new business models driving these economic models. Companies should be cautious when sharing data, Python jobs, notebooks, or other resources with other users or external parties. Being aware of the contextual aspects of what data is being shared, the significance of the data, and if the share is or can be time-bound are extremely important dimensions of understanding how data sharing can affect the company beyond simple economic rationale. Incorrect data sharing can devastate a company’s business and reputation. 
  4. Insider threats: Ensure you have tools and processes to address insider threats. Have users been phished? Are privileged users dumping out data? If so, why are they, and where is the data going?  Can you specifically detect exfiltration activities? How do you detect such an incident? What are the steps in place to address and prevent such attacks? How do you respond? Are the responses automated or manual? 
  5. Insufficient Monitoring and Auditing: Regularly monitor and audit your cloud data warehouse environment to detect suspicious activities and potential security issues. How can you classify which behaviors are typical and which are abnormal? Are there atypical behaviors exhibited by human users, machine users, or third parties accessing data? 
  6. Misconfigurations: As with any platform, an improper configuration can expose your data and environment. Most data clouds provide encryption controls for data at rest and in motion. However, incorrect configurations are prevalent, especially when there are very convenient methods for data sharing. Validating correct configuration and continuously monitoring for deviations is necessary. Additionally, steps should be taken to verify that the company follows security best practices, such as configuring access controls and network policies. Misconfiguration doesn’t only happen in the access or rights settings. Misconfigured notifications are also a common issue in many enterprises, easily allowing bad behavior to fly under the radar. 
  7. Insecure Access Practices: Unfortunately, it’s very common to see data validation scripts, Python code/Notebooks/Client SQL Scripts that contain sensitive information, such as access keys, passwords, or confidential data. It is essential to practice good hygiene with code that handles sensitive data. 
  8. Insecure APIs: When using APIs to interact with your cloud data warehouse, ensure that proper authentication and authorization are in place and validate all incoming data. SQL injection is a common attack technique to take advantage of insecure coding. 

It’s always good practice to stay up-to-date with your cloud warehouse vendor’s recommendations and best practices and continuously review and improve your organization's security posture to eliminate as many security blind spots as possible. Recovering from a data breach event is very costly and damaging for any organization. If possible, it is better to avoid and mitigate the risk of fines, reputation loss, and trust erosion rather than always waiting for something to happen and then reacting to the situation. 

One of the best ways to minimize data security blindspots from your cloud warehouse instance is to deploy Theom inside your data warehouse, lake, or store. Theom deploys in minutes with no agents, no proxies, or impact on performance. Theom supports Snowflake, Databricks, AWS, and Azure data stores. 

5-day Proof of Value (POV) - Try us out and see immediate value. 

Suggested further reading: 

Transient shares on cloned data in data warehouses 

Eight ways data can leave your cloud data warehouse