Account Info File to PostgreSQL

In this project we’ll

  • Extract data from a delimited file containing: password, userid …
  • Transform text data.
  • Load data into a PostgreSQL

Extract

  • Extract the fields
    • field 1 (user name)
    • field 2 (user id)
    • field 6 (home directory path) from /etc/passwd
cut -d":" -f1,3,6 /etc/passwd

Transform

  • Transform : to ,
  • save data in transformed-data.csv
extracted-data.txt | tr ":" ","   > transformed-data.csv

Start PostgreSQL

  1. From the SkillsNetwork tools, under Databases choose PostgresSQL Database server and click Start to start the server. This will take a few mins.
  2. Click PostgresSQL CLI on the screen to start interacting wit the PostgresSQL server.
  3. This will start the interactive psql client which connects to the PostgreSQL server with postgres=# prompt as shown below.
theia@theiadocker-emhrcf:/home/project$ export PGPASSWORD=M8fYujLI4OSDHqtXUU2VVJ4N;
psql --host 172.21.64.88 -p 5432 -U postgres
psql (14.12 (Ubuntu 14.12-0ubuntu0.22.04.1), server 13.2)
Type "help" for help.

postgres=# 

Create Table

We’ll create a table called users in the PostgreSQL database using PostgresSQL CLI. This table will hold the user account information.

The table users will have the following columns:

  • username
  • uid
  • home
  • You will connect to template1 database which is already available by default. To connect to this database, run the following command at the ‘postgres=#’ prompt.
\c template1
  • You will get the following message.
You are now connected to database "template1" as user "postgres".
  • Also, your prompt will change to ‘template1=#’.
  • Run the following statement at the ‘template1=#’ prompt to create the table.
create table users(username varchar(50),userid int,homedirectory varchar(100));
  • If the table is created successfully, you will get the message below.
CREATE TABLE

Load

Data into Table

  • Open a new terminal
  • In the terminal, run the following command to create a new shell script named csv2db.sh.
touch csv2db.sh
  • Open the file in the editor.
  • Copy and paste the following lines into the newly created file.
# Extracts data from /etc/passwd file into a CSV file.

# The csv data file contains the user name, user id and
# home directory of each user account defined in /etc/passwd

# Transforms the text delimiter from ":" to ",".
# Loads the data from the CSV file into a table in PostgreSQL database.
  • Save the file by presseing Ctrl+s or by using the File->Save menu option.

  • You need to add lines of code to the script that will xtract user name (field 1), user id (field 3), and home directory path (field 6) from /etc/passwd file using the cut command.

  • Copy the following lines and paste them to the end of the script and save the file.

# Extract phase

echo "Extracting data"

# Extract the columns 1 (user name), 2 (user id) and 
# 6 (home directory path) from /etc/passwd

cut -d":" -f1,3,6 /etc/passwd
  • Run the script
bash csv2db.sh
  1. Verify that the output contains the three fields, that you extracted.

  2. Change the script to redirect the extracted data into a file named extracted-data.txt

  3. Replace the cut command at end of the script with the following command.

cut -d":" -f1,3,6 /etc/passwd > extracted-data.txt
  • Run the script
bash csv2db.sh
  • Run this command to verify the file extracted-data.txt is created, and has the content
cat extracted-data.txt
  • The output shows the extracted columns are separated by the original “:” delimiter. You need to convert this into a “,” delimited file.
  • Add the below lines at the end of the script and save the file.
# Transform phase
echo "Transforming data"
# read the extracted data and replace the colons with commas.

tr ":" "," < extracted-data.txt  > transformed-data.csv
  • Run the script
bash csv2db.sh`
  • Run the command below to verify that the file transformed-data.csv is created, and has the content.
cat transformed-data.csv
  • To load data from a shell script, you will use the psql client utility in a non-interactive manner. This is done by sending the database commands through a command pipeline to psql with the help of echo command.

PostgreSQL command to copy data from a CSV file to a table is COPY.

The basic structure of the command which we will use in our script is,

COPY table_name FROM 'filename' DELIMITERS 'delimiter_character' FORMAT;

Now, add the lines below to the end of the script ‘csv2db.sh’ and save the file.

# Load phase
echo "Loading data" 
# Set the PostgreSQL password environment variable. 
# Replace <yourpassword> with your actual PostgreSQL password. export PGPASSWORD=<yourpassword>; 
# Send the instructions to connect to 'template1' and 
# copy the file to the table 'users' through command pipeline.
echo "\c template1;\COPY users  FROM '/home/project/transformed-data.csv' DELIMITERS ',' CSV;" | psql --username=postgres --host=postgres
  • Run the script
bash csv2db.sh
  • Now, add the line below to the end of the script ‘csv2db.sh’ and save the file.
echo "SELECT * FROM users;" | psql --username=postgres --host=postgres template1

Run the script

bash csv2db.sh

Entire script file

# Extracts data from /etc/passwd file into a CSV file.

# The csv data file contains the user name, user id and
# home directory of each user account defined in /etc/passwd

# Transforms the text delimiter from ":" to ",".
# Loads the data from the CSV file into a table in PostgreSQL database.
# Extract phase

echo "Extracting data"

# Extract the columns 1 (user name), 2 (user id) and 
# 6 (home directory path) from /etc/passwd

cut -d":" -f1,3,6 /etc/passwd
# Transform phase
echo "Transforming data"
# read the extracted data and replace the colons with commas.

tr ":" "," < extracted-data.txt  > transformed-data.csv
# Load phase
echo "Loading data"
# Set the PostgreSQL password environment variable.
# Replace <yourpassword> with your actual PostgreSQL password.
export PGPASSWORD=<yourpassword>;
# Send the instructions to connect to 'template1' and
# copy the file to the table 'users' through command pipeline.
echo "\c template1;\COPY users  FROM '/home/project/transformed-data.csv' DELIMITERS ',' CSV;" | psql --username=postgres --host=postgres
echo "SELECT * FROM users;" | psql --username=postgres --host=postgres template1