Identity and Access Management

Quickly generating a dataset of fictitious Users using Randomised Real Data and PowerShell

Introduction

I’ve lost count of the number of times I’ve had the need to generate a representative dataset of users. Of course I have access to many production datasets but for many reasons they can’t be used. Finding previous datasets I’ve randomly generated always seems to take longer than it should, so with my most recent iteration of having to generate a fictitious list of users with Australian addresses, I’ve documented how I went about it, along with the source data I used and the script to create it.

Source Data

For my data sources to base my dataset off, I wanted representative data for Australia for both people names and locations. After a few quick searches I found;

  • that Data South Australia has lists of baby names for both male and female babies in SA. I downloaded the 2017 lists as CSV’s.
  • for Surname, also from Data South Australia I borrowed the 19th Century Arrivals list and manipulated the Fullname column to separate it on “,” then used the Excel Function to remove duplicates. I deleted all other columns so that I was left with just over 13,000 surnames in a CSV file.
  • Matthew Proctor’s list of Australian Postcodes as a CSV. This provides Postcode, Suburb and State.
  • Brisbane City Council (Australia’s largest Council) has a dataset with all bus locations that includes Street names as a CSV. Like I did for Surname I used the Excel Function to remove duplicates, removed the blanks and the other columns and then had just over 1600 street names.

The Script

The script is pretty simple. It imports each of the CSV’s listed above and generates a random number based on the number of records in each file.

The GitHub Repo contains the PowerShell script along with the source files. Change line 3 for the location where you store the CSV files and change line 66 for the number of users to generate. I’ve left the end of the script empty. I either insert the API call to create the users, or the PowerShell cmdlet with the data to do the creation depending on where I’m creating the users.

The Output

Here is a sample output in JSON format.

{
"Street": "370 Miskin St",
"Surname": "Burne",
"Suburb": "WOODBROOK",
"Postcode": "3451",
"State": "VIC",
"GivenName": "Miro"
}
{
"Street": "293 Preston Rd",
"Surname": "Partingale",
"Suburb": "MARRARA",
"Postcode": "812",
"State": "NT",
"GivenName": "Daniella"
}
{
"Street": "409 Orchard St",
"Surname": "Liaseyer",
"Suburb": "THURGOONA",
"Postcode": "2640",
"State": "NSW",
"GivenName": "Ariana"
}
{
"Street": "775 Station Rd",
"Surname": "Nevin",
"Suburb": "AVON DOWNS",
"Postcode": "862",
"State": "NT",
"GivenName": "Naria"
}

Summary

Using data publicly available and PowerShell it is possible to quickly generate a dataset of representative users and addresses. Generating other attributes is as easy as extrapolating from the existing data or supplementing it with additional source data files.

Darren Robinson

Bespoke learnings from a Microsoft Identity and Access Management Architect using lots of Microsoft Identity Manager, Azure Active Directory, PowerShell, SailPoint IdentityNow and Lithnet products and services.

View Comments

Recent Posts

Visualising your IP Address using PowerShell and AI

A few weeks back the Microsoft AI Tour was in Sydney Australia. There was a…

3 weeks ago

Where the heck is the PowerShell Module loading from?

If you're anything like me you always have PowerShell open, and often both PowerShell and…

4 months ago

Express Verified ID Setup

Decentralised Identity is a technology I'm passionate about and have written many posts and tools…

5 months ago

Orchestrating 1Password with PowerShell

Over two years ago I authored a PowerShell Module that enabled the automation of 1Password.…

8 months ago

Entra ID Tenant ID & Custom Domains PowerShell Module

Buried in my PowerShell Snippets Vol 4 post from 2021 is the PowerShell script and…

8 months ago

Windows Subsystem for Linux instance has terminated

Short post on how to recovery from "The Windows Subsystem for Linux instance has terminated"…

9 months ago

This website uses cookies.