Terraform + Redshift: Automating Data Warehouse Deployments (and a Few Fun Tricks Along the Way)

As a cloud architect, I’ve seen more teams lean into Infrastructure as Code (IaC) to manage their data platforms, and AWS Redshift is no exception. With Terraform, you can spin up entire Redshift environments, configure VPC networking, IAM roles, and security groups, and even automate user/role creation.

But this post isn’t just a dry walk-through. I’ll show you how to wire it all up and then share a few fun Redshift features I stumbled across that might make your SQL life more enjoyable (or at least more tolerable).

Why Use Terraform with Redshift?

AWS Redshift is powerful, but the console setup process is… less than ideal. Terraform fixes that.

Benefits of Terraforming Redshift:

Repeatability: One config file to rule them all. Version control: Track changes in Git. Modularity: Use reusable Terraform modules to deploy clusters in dev, staging, or prod. Security: Define IAM roles and permissions declaratively, without manual error.

Quickstart: Provisioning a Redshift Cluster

Here’s a super basic example:provider "aws" { region = "us-east-1" } resource "aws_redshift_cluster" "main" { cluster_identifier = "my-redshift-cluster" database_name = "analytics" master_username = "admin" master_password = "YourSecurePassword1" node_type = "dc2.large" cluster_type = "single-node" publicly_accessible = false iam_roles = [aws_iam_role.redshift_role.arn] } resource "aws_iam_role" "redshift_role" { name = "redshift-role" assume_role_policy = jsonencode({ Version = "2012-10-17", Statement = [{ Action = "sts:AssumeRole", Effect = "Allow", Principal = { Service = "redshift.amazonaws.com" } }] }) }

This gets you a one-node cluster with IAM integration. From here, you can bolt on everything from VPC routing to monitoring and logging.

Terraform + SQL = A Beautiful Workflow

Once your cluster is live, it’s time to automate SQL scripts too. Use local-exec or integrate your Terraform pipeline with something like Flyway, Liquibase, or just a simple script that runs psql or Redshift Data API commands after provisioning.

Fun Redshift/PostgreSQL Functions

Terraform is great, but let’s not forget why you’re here—SQL wizardry.

Here are a few Redshift functions or features that are surprisingly useful (and occasionally delightful):

1. LISTAGG – String Aggregation in a Single RowSELECT department, LISTAGG(employee_name, ', ') WITHIN GROUP (ORDER BY employee_name) AS team FROM employees GROUP BY department;

Great for showing comma-separated team members by department.

2. GENERATE_SERIES – Fake Data for Free

Redshift doesn’t support this natively like Postgres, but you can emulate it:WITH RECURSIVE series(n) AS ( SELECT 1 UNION ALL SELECT n + 1 FROM series WHERE n < 100 ) SELECT * FROM series;

Useful for faking time ranges, populating calendars, etc.

3. PERCENTILE_CONT – Smooth Distribution MetricsSELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary FROM employees;

Yes, you can finally stop explaining why AVG() isn’t the same as the median.

4. STL_SCAN & SVL_QUERY_REPORT – Query-Level Insight

Want to know how Redshift is scanning your data? These internal system views are gold for optimization.

Wrap Up

Terraform lets you take control of Redshift the same way you manage EC2, S3, or RDS. It’s scriptable, testable, and repeatable—which is exactly what you want when managing a data warehouse that powers business decisions.

And once you’re live, remember: Redshift has a few SQL tricks up its sleeve. Dig into its PostgreSQL heritage, and you’ll find some gems that make analytics more fun—or at least more tolerable at 3 AM when you’re debugging a query.

If you’re already Terraforming Redshift and have a favorite function or optimization tip, drop it in the comments. Or better yet—let’s turn it into a module.

Hardening Snowflake Access

While Snowflake provides strong default security capabilities, enterprises must take additional steps to lock down the identity and access layer to prevent unauthorized data access. In this article, we focus on front-end hardening, including authentication mechanisms, federated identity controls, network access constraints, and token lifecycle management. The goal is to ensure that only the right identities, coming from the right paths, under the right conditions, ever reach Snowflake.

Snowflake supports SAML 2.0 and OAuth 2.0-based SSO. You should disable native Snowflake usernames/passwords entirely for production accounts.

SAML (Okta, Azure AD, Ping)

  • Create an enterprise application in your IdP.
  • Configure SAML assertions to pass custom attributes like role, email, department.
  • Use SCIM provisioning to sync Snowflake users/roles/groups from the IdP.
  • Set the session timeout in your IdP lower than Snowflake’s to avoid dangling sessions.

Avoid provisioning ACCOUNTADMIN via SAML. Manage this break-glass account separately.

Use EXTERNAL_OAUTH instead of SAML for modern applications or service integrations.


Lock Down OAuth Access

Snowflake supports OAuth for both interactive and non-interactive use. You must restrict token scope, audience, and expiration windows to reduce risk.

Key Hardening Strategies:

  • Use short-lived tokens (preferably < 15 minutes) for all automation.
  • Configure audience (aud) and scopes strictly. Avoid issuing tokens with SESSION:ALL unless required.
  • Rotate client secrets for OAuth apps regularly.
  • For client credentials flow, ensure the client cannot impersonate high-privileged roles.
  • Deny ACCOUNTADMIN access through any OAuth app.

Supported Scopes Examples:

ScopeDescription
session:role:<role_name>Restrict token to one role only
session:warehouse:<wh>Bind session to specific warehouse

Snowflake OAuth Token Validation (under the hood)

sqlCopyEditSELECT SYSTEM$EXTRACT_OAUTH_TOKEN_CLAIM('aud', '<access_token>');

Restrict Native Auth with Network Policies

Even if federated auth is enforced, native logins can still exist and be exploited. Use Snowflake Network Policies to harden this path.

Strategy:

  • Create two network policies:
    • admin_policy: Only allows break-glass IP ranges
    • default_policy: Allows enterprise proxy or VPN egress IPs only
  • Apply the default to the account:
sqlCopyEditALTER ACCOUNT SET NETWORK_POLICY = default_policy;
  • Apply admin_policy only to named break-glass users.

Consider a Cloudflare or Zscaler front-door to enforce geo/IP conditions dynamically.


Implement Role Hierarchy Discipline

Identity hardening isn’t complete without strict RBAC practices. Prevent role escalation and sprawl.

Best Practices:

  • Disallow use of SET ROLE = ANY unless strictly required.
  • Disable PUBLIC role from having any privileges:
sqlCopyEditREVOKE ALL PRIVILEGES ON DATABASE <db> FROM ROLE PUBLIC;
  • Ensure OAuth apps assume dedicated roles, not shared ones.
  • Create role chains with the principle of least privilege — don’t nest high-priv roles into commonly used ones.

Enable MFA Everywhere You Can

Snowflake itself doesn’t natively enforce MFA — this must be handled at the IdP or via authentication proxy.

Solutions:

  • Require MFA on all interactive IdP logins (Okta, Azure, Ping).
  • Use conditional access policies to block access if MFA isn’t passed.
  • For break-glass accounts: Use hardware token MFA or a privileged access broker like CyberArk.

Monitor and Rotate OAuth Secrets & Certificates

Snowflake OAuth integrations (especially EXTERNAL_OAUTH) rely on JWT signing keys.

Operational Controls:

  • Rotate JWT signing certs every 90 days.
  • Monitor for expired or invalid tokens using:
sqlCopyEditSHOW INTEGRATIONS;
  • Alert on integration usage via LOGIN_HISTORY or EVENT_HISTORY.

Harden Service Account Behavior

Service identities are often the weakest link. Use automation to provision/deprovision service roles and tokenize all secrets via a secrets manager like Vault.

Key Points:

  • Never let a service identity own SECURITYADMIN or ACCOUNTADMIN.
  • Tag tokens via session:user_agent, session:client_ip and audit the usage patterns.
  • For zero trust, bind each service to its own network segment and OAuth flow.

Monitor Everything at the Edge

Don’t trust Snowflake alone for auditing — bring data into your SIEM.

Ingest from:

  • LOGIN_HISTORY
  • QUERY_HISTORY
  • ACCESS_HISTORY
  • EVENT_HISTORY

Pipe into Splunk, Snowflake, or S3 via Snowpipe and monitor:

  • Role switching anomalies
  • Login attempts outside normal hours
  • OAuth token usage per integration

In Defense of Integrity: Standing with Chris Krebs

In an age where cybersecurity threats are relentless and disinformation moves faster than truth, we need leaders who are brave enough to speak facts—even when it’s inconvenient. Chris Krebs, the former director of the Cybersecurity and Infrastructure Security Agency (CISA), has been that kind of leader from day one.

Krebs didn’t seek fame. He didn’t seek a fight. He sought the truth.

As director of CISA, he led one of the most critical missions in modern government: protecting the integrity of U.S. elections and critical infrastructure. Under his leadership, CISA declared the 2020 election “the most secure in American history”—a statement backed by career security experts, intelligence assessments, and hard data.

That statement, grounded in evidence, got him fired.

And now, years later, in a deeply concerning escalation, the current administration has reportedly revoked his security clearance and ordered an investigation into his work at CISA. Let’s be clear—this isn’t about security. It’s about political revenge.

Krebs has since continued to serve the public good, both as co-founder of the Krebs Stamos Group and in his role at SentinelOne. He remains one of the few voices in the field who speaks plainly, refuses to bend to political pressure, and puts the country before career.

If we want to live in a world where facts matter, where professionals are empowered to do the right thing, and where public servants don’t fear retaliation for speaking truth, then we must stand by Chris Krebs.

This isn’t about party. It’s about principle.

We owe our respect—and our support—to those who prioritize the safety of the country over the safety of their own jobs. Krebs did exactly that.

And we should all be damn grateful he did.

Terraform Cloud with Vault

Messing around with Terraform this weekend, I dove into some new functionalities for storing data in HashiCorp Vault, and I was blown away by how much I could automate using Terraform Cloud. The integration between these two tools has helped me automate a lot in my home lab making it more efficient and secure.

Simplifying Secrets Management with Vault

HashiCorp Vault is a powerful tool for securely storing and accessing secrets. It provides a centralized way to manage sensitive data, such as API keys, passwords, and certificates. Vault’s dynamic secrets feature is particularly impressive, allowing for the automatic generation and rotation of secrets. This significantly reduces the risk of secret sprawl and unauthorized access.

Automating Infrastructure with Terraform Cloud

Terraform Cloud is a robust platform for infrastructure as code (IaC) management. It enables teams to collaborate on Terraform configurations, providing a consistent and reliable way to manage infrastructure. Terraform Cloud’s powerful automation capabilities allow for the continuous integration and deployment of infrastructure changes, ensuring that environments are always up-to-date and compliant.

Unleashing the Potential of Terraform Cloud and Vault

Combining Terraform Cloud with HashiCorp Vault has been a game-changer for my projects. Here’s how I utilized these tools over the weekend:

  1. Automated Secrets Storage: Using Terraform Cloud, I automated the process of storing and managing secrets in Vault. This eliminated the manual steps typically required, ensuring that secrets are securely stored and easily accessible when needed.
  2. Dynamic Secret Generation: I leveraged Vault’s ability to generate dynamic secrets, automating the creation of temporary credentials for various services. This not only improved security but also simplified the management of credentials.
  3. Infrastructure Provisioning: With Terraform Cloud, I automated the provisioning of infrastructure components that require access to secrets. By integrating Vault, these components could securely retrieve the necessary credentials without hardcoding them in configuration files.
  4. Policy Management: I used Terraform Cloud to define and manage Vault policies, ensuring that the right permissions were in place for different users and applications. This centralized approach made it easier to enforce security best practices across the board.

Happy automating!

Backing up Pytorch Settings


Backing Up Settings with Python Scripting

PyTorch stands out as one of the most popular frameworks due to its flexibility, ease of use, and dynamic computation graph. Managing settings and configurations across different experiments or projects can sometimes become a cluster f*@%. In this blog, i’ll explain a streamlined approach to managing settings in PyTorch using Python scripting, allowing for easy backup and retrieval of configurations.

Understanding the Importance of Settings Management:

  • In any machine learning project, experimentation involves tweaking various hyperparameters, model architectures, and training configurations.
  • Keeping track of these settings is crucial for reproducibility, debugging, and fine-tuning models.
  • Manual management of settings files or notebooks can lead to errors and inefficiencies, especially when dealing with multiple experiments or collaborators.

Leveraging Python for Settings Backup:

  • Python’s versatility makes it an ideal choice for automating repetitive tasks, such as backing up settings.
  • We can create a script that parses relevant settings from our PyTorch code and stores them in a structured format, such as JSON or YAML.

Designing the Backup Script:

  • Define a function to extract settings from PyTorch code. This may involve parsing configuration files, command-line arguments, or directly accessing variables.
  • Serialize the settings into a suitable format (e.g., JSON).
  • Implement a mechanism for storing the settings, such as saving them to a file or uploading them to a cloud storage service.
  • Optionally, add functionality for restoring settings from a backup.

Here is a good example.

import json

def extract_settings():
# Example: Extract settings from PyTorch code
settings = {
‘learning_rate’: 0.001,
‘batch_size’: 32,
‘num_epochs’: 10,
# Add more settings as needed
}
return settings

def backup_settings(settings, filepath):
with open(filepath, ‘w’) as file:
json.dump(settings, file)

def main():
settings = extract_settings()
backup_settings(settings, ‘settings_backup.json’)
print(“Settings backup complete.”)

if name == “main“:
main()

Vault is not a HSM…

Introduction: In the ever-evolving landscape of data security, understanding the tools at our disposal is crucial. Two such tools, HashiCorp Vault and Hardware Security Modules (HSMs), often get mentioned in the same breath but serve distinctly different purposes. This blog post aims to demystify these technologies, highlighting why a Vault is not an HSM and how they complement each other in securing our digital assets.


What is HashiCorp Vault? HashiCorp Vault is a software-based secrets management solution. It’s designed to handle the storage, access, and management of sensitive data like tokens, passwords, certificates, and encryption keys. Vault’s strengths lie in its versatility and dynamic nature, providing features like:

  • Dynamic Secrets: Generating on-demand credentials that have a limited lifespan, thus minimizing risks associated with static secrets.
  • Encryption as a Service: Allowing applications to encrypt and decrypt data without managing the encryption keys directly.
  • Robust Access Control: Offering a range of authentication methods and fine-grained access policies.

What is a Hardware Security Module (HSM)? An HSM is a physical device focused on protecting cryptographic keys and performing secure cryptographic operations. Key aspects include:

  • Physical Security: Built to be tamper-resistant and safeguard cryptographic keys even in the event of physical attacks.
  • Cryptographic Operations: Specialized in key generation, encryption/decryption, and digital signing, directly within the hardware.
  • Compliance-Ready: Often essential for meeting regulatory standards that require secure key management.

Key Differences:

  1. Nature and Deployment:
    • Vault is a flexible, software-based tool deployable across various environments, including cloud and on-premises.
    • HSMs are physical, tamper-resistant devices, providing a secure environment for cryptographic operations.
  2. Functionality and Scope:
    • Vault excels in managing a wide range of secrets, offering dynamic secrets generation and encryption services.
    • HSMs focus on securing cryptographic keys and performing hardware-based cryptographic functions.
  3. Use Case and Integration:
    • Vault is suitable for organizations needing a comprehensive secrets management system with flexible policies and integrations.
    • HSMs are ideal for scenarios requiring high-assurance key management, often mandated by compliance standards.

Why Vault is Not an HSM: Simply put, Vault is not an HSM because it operates in a different realm of data security. Vault is a software layer providing a broad spectrum of secrets management capabilities. It doesn’t offer the physical security inherent in HSMs but excels in managing access to secrets and encrypting data. Conversely, HSMs provide a hardened, secure environment for cryptographic operations but don’t have the extensive management features of Vault.


Complementary, Not Competitive: In a comprehensive security strategy, Vault and HSMs are not competitors but collaborators. Vault can integrate with HSMs to leverage their physical security for key storage, combining the best of both worlds: the flexibility and extensive management of Vault with the robust, physical security of HSMs.


Streamlining Presentations: The Power of Automation in PowerPoint Data Generation

Creating the perfect PowerPoint presentation is an art—an equilibrium between compelling content and striking visuals. However, for professionals and developers who need to test the efficiency of co-authoring tools or presentation software, the content itself can sometimes be secondary to the functionality being tested. That’s where the power of automation comes in, particularly in generating mock data for PowerPoint presentations.

I’ve been working on a fun side project It’s a script that allows users to create ‘fake’ PowerPoint data to simulate various scenarios and test how long it takes to read through the content in a process akin to co-authoring. For those intrigued by how this automation operates and its potential benefits, you can delve into the details on my GitHub repository.

Why Automate PowerPoint Data Generation?

The reasons for automating data generation are numerous, especially in a corporate or development setting:

  • Testing Efficiency: For software developers and IT professionals, having a tool that automatically generates data can significantly aid in testing the efficiency of co-authoring tools and other collaborative features in presentation software.
  • Training: Automated mock presentations can serve as training material for new employees, helping them get acquainted with presentation tools and company-specific templates.
  • Benchmarking: By standardizing the length and complexity of the generated content, teams can benchmark the performance of their software or the productivity of their staff.

How Does the Automation Work?

The automation script I developed is designed to be intuitive. It populates PowerPoint slides with random text, images, and data. The script takes into account different factors like text length and complexity, mimicking real-world presentations without the need for manual data entry.

Moreover, I incorporated a timing mechanism to assess how long a ‘co-authoring’ read-through would take. This feature is invaluable for software developers who aim to improve the collaborative aspects of presentation tool

It is up now on my github

Terraform learning

As someone who hasn’t been using Terraform for years, something things I’m about to say are obvious to you, someone who likely already knows that it’s a powerful infrastructure-as-code (IAC) tool that allows you to automate the provisioning and management of your cloud resources. With Terraform, you can define your infrastructure using a declarative language, and then use that definition to create, update, and destroy your resources in a consistent and repeatable way.

It has been a fantastic tool to get to know. Most fun I’ve had in technology in a long time.

One of the key benefits of using Terraform is that it allows you to abstract away the complexity of the underlying cloud APIs and services. Instead of having to write custom scripts or manually configure each individual resource, you can define your infrastructure in a high-level, human-readable format that can be version-controlled and shared with your team. This makes it easier to collaborate, track changes, and ensure consistency across your infrastructure.

Terraform also provides a number of built-in features and plugins that make it easy to work with a wide range of cloud providers, services, and tools. For example, you can use Terraform to provision infrastructure on AWS, Azure, Google Cloud, and many other cloud providers. Additionally, Terraform supports a wide range of resource types, including compute instances, load balancers, databases, and more.

Another benefit of using Terraform is that it allows you to automate your infrastructure changes with confidence. Because Terraform is declarative, you can see exactly what changes will be made to your infrastructure before you apply them. This helps you avoid unexpected changes and ensures that your infrastructure remains stable and secure.

Terraform is a fantastic tool for automating your infrastructure and managing your cloud resources. Whether you’re working on a small project or a large-scale enterprise deployment, Terraform can help you achieve your goals quickly and efficiently.

Figuring out DKIM

I often wonder why haven’t more companies rolled out DKIM at this point as it is clearly a fix for so many phishing/SPAM issues.

DKIM, which stands for DomainKeys Identified Mail, is an email authentication method designed to detect email spoofing and phishing. It works by allowing an organization to attach a digital signature to an email message, which can be validated by the recipient’s email server. DKIM is an important security feature for any organization that sends email, as it helps to prevent fraudulent emails from being delivered to the recipient’s inbox.

In Office365 and Exchange online, not using DKIM can pose several dangers. Here are a few of them:

  1. Increased risk of phishing attacks: Phishing attacks are a type of cyber attack that involve tricking users into revealing sensitive information, such as login credentials or credit card details. Without DKIM, it becomes easier for attackers to impersonate legitimate senders and convince recipients to provide their personal information.
  2. Increased risk of email spoofing: Email spoofing is when an attacker sends an email that appears to be from a legitimate sender, but is actually from a fraudulent source. DKIM helps to prevent email spoofing by verifying that the email actually came from the sender’s domain. Without DKIM, it becomes easier for attackers to impersonate legitimate senders and deceive recipients.
  3. Increased risk of email interception: Email interception is when an attacker intercepts an email in transit and reads its contents. Without DKIM, it becomes easier for attackers to intercept and read emails, as there is no digital signature to validate the authenticity of the email.
  4. Decreased email deliverability: Many email providers, including O365, use DKIM as a factor in their spam filtering algorithms. Without DKIM, emails may be more likely to be flagged as spam or rejected by the recipient’s email server, resulting in decreased email deliverability.

Not using DKIM in O365 can pose several dangers, including increased risk of phishing attacks and email spoofing, increased risk of email interception, and decreased email deliverability. Therefore, it is highly recommended that organizations use DKIM to help ensure the security and authenticity of their email communications.