Efficient CSV Data Extraction with Vanilla PHP: A Comprehensive Tutorial

Efficient CSV Data Extraction with Vanilla PHP: A Comprehensive Tutorial

Master Data Extraction from CSV Files and Showcase it on Another Page with Pure PHP

A while back, I found myself entrusted with a Laravel codebase, faced with the exciting challenge of upgrading the project to the latest version of Laravel and a higher PHP version.
During the upgrade, a previously functional feature, which extracted data from a CSV file and uploaded it to a database, suddenly stopped working.

This feature relied on an external Laravel package that I found too cumbersome for its simple purpose. After investigating, I realized the package was deprecated and incompatible with the new PHP and Laravel versions.

I attempted to find alternative Laravel packages for the task, but they were all too complex for my needs. Consequently, I decided to create my own lightweight and straightforward CSV extractor to get the job done.

In this article, you will discover how to extract data from a CSV document and present it on a web page. To maintain simplicity, I have employed procedural PHP programming patterns, ensuring that even beginners can easily follow and comprehend the process.

Prerequisites

  1. Have PHP installed on your machine

  2. Basic understanding of HTML and CSS

  3. Basic Understanding of PHP

  4. Basic Understanding of PHP arrays

What do we need for this project?

  1. We need a form that will enable users to upload the CSV file

  2. We need a page to display our extracted data

  3. Lastly, we need a dedicated folder to store uploaded CSV files

To get started we need to create a folder/directory for our project. I am a big fan of proper organization of things and separation of concern when I am working on a project regardless of how big or small the project is.

Here is the project folder structure for this application

You can follow the directory structure I provided here or create yours, what ever works for you is fine.

Let's add a form inside our index.php file to provide a channel for users to upload a CSV file.

<form action="core/extractor.php" method="POST" enctype="multipart/form-data">
        <div style="padding: 10px; border: 1px solid #000;">
            <div>
                <label for="csvFile">CSV File:</label>
                <input type="file" id="csvFile" name="csfile", accept=".csv" required>
            </div>
            <hr>
            <div>
                <label for="csvFile">What to extact</label>
                <select name="type" id="extractType">
                    <option disabled selected>Choose what to extract</option>
                    <option value="3">Extract Header and Content</option>
                    <option value="1">Extract Header</option>
                    <option value="2">Extract Content</option>
                </select>
            </div>
        </div>
        <br>
        <button type="submit">Submit</button>
    </form>

If you look at our form action, it points us to the extractor service in the core directory.

Inside this extraction service file, we will have one major operation which is Form Processing.

This Form Processing will encompass 4 different activities which are:

  1. Form input validation

  2. Redirection

  3. File uploading

  4. Data extraction from uploaded CSV file

<?php

//Process form request

if($_SERVER["REQUEST_METHOD"] === "POST"){
    // pass each input into a variable 
    $file = $_FILES['csfile']; 
    $type =  $_POST['type'];

    //validate form inputs
    $validate = validateForm($file, $type);

    if($validate !== true){
        $errorMsg = $validate;
        redirect('error',$errorMsg);
    }
}

Let's take a look at the form processing code above.

We need to retrieve the request that was submitted through our form and also check the method of submission.

In other to achieve that we used the PHP $server superglobals, the PHP $server superglobal has many flavors since it offers filter keys that we can use to access some of the server resources, one of which is the REQUEST_METHOD filter.

This REQUEST_METHOD helps us to check the mode or method of form submission from the user. In our case, we are restricting them to only submit data using the post method which is already defined in our form method.

Form input validation

After checking for the submission method we need to validate that the data submitted is what we need hence a validation function was created to do that.

function validateForm($file, $type){
    //check if input submited are empty
    if(!isset($type) && !isset($file)){
        return "<span style='color:red'> all fields are required</span>";
    }
    if($file['size'] > 20024){
        return  "<span style='color:red'> File size must not be greater than 100kb</span>";
    }

    if($file['type'] !== 'text/csv'){
        return  "<span style='color:red'> File must be a .CSV file</span>";
    }

    return true;

}

I like to keep things separated, so I created the validation function separately.

In our HTML form, we require two inputs from our users which are

  1. The CSV file

  2. Extraction type

Therefore our validation function is going to accept two arguments.

Inside the validation, we are checking for empty input values, if we encounter any, we through an error for the user. How do we do that? this is where our redirection function comes into action.

Redirection

function redirect($responseType, $responseData){
    if(!in_array($responseType, ['error', 'success'])){
        die("<p style='color: red; font-weight: bold'>Response type is undefined</p>");
    };


    if(is_array($responseData)){
        $data = json_encode($responseData);
    }else{
        $data = $responseData;
    }

    $url = $_SERVER['HTTP_ORIGIN']. '/projects/csv_extractor/';   

    $urlparam = '?'. $responseType.'='.$resType;

    $page = ($responseType == 'success')? 'resultPage.php' : 'index.php' ;

    $url.= $page. $urlparam;

    if($resType == 'error'){
        $_SESSION['error'] =  $data;
    }else{
        $_SESSION['success'] =  $data;
    }


    header("Location: $url");
    exit();
}

Our Redirection function takes in, two arguments response type and response data.

We made sure that our response type falls between two categories, Error and Success. This was made possible utilizing the PHP in_array method, this method checks if a given value matches any item specified within array values.

The next important section of the redirect function is the URL construction part where we get the domain name of the website using it as the base origin to construct our redirect URL.

We used the PHP short syntax for the if statement to check the response type.

if the response type is an error we redirect the user back to the previous page if it is a success we push the user to the result page.

Another important part of this redirect function is utilizing a PHP session superglobal to pass data across pages. In the redirection function, we push our response data to the session variable, that way we can access the data in any part of our application.

File uploading

After the validation, if there are no validation errors we move to uploading the submitted file to the server. As usual, we will create a function that will perform this upload. This function will require two arguments, firstly the file we want to store, and secondly the path/location where we want to store this file.

function uploadFile($file, $filepath) {

    if(move_uploaded_file($file["tmp_name"], $filepath)){
        return true;
    }

    return false;
}

In our function above we are using the PHP move_uploaded_file() function to move the file from a temporary directory to the location we specified in the function arguments.

We can access this temporary directory using the PHP $FILES superglobal variable. This variable provides some filter keys we can use to access in-depth information about the file uploaded to the server, one of these keys is the ["tmp_name"] tmp_name provides the temporary location of the file on the server.

Let me explain some concepts here.

Whenever a file is uploaded to the server, the server creates a temporary directory to hold the uploaded file before the application calls up the file for processing and proper storage. After the application has properly processed and stored the file, this temporary directly is cleaned up to free server space.

Now before we call the upload function within our form processing code block we need to prepare the location where we want to store the file.

So far, our extractor is gradually coming into shape, as of right now, this is what our extractor.php should look like.

if($_SERVER["REQUEST_METHOD"] === "POST"){

   // pass each input into a variable 

    $file = $_FILES['csfile'];

    $type =  $_POST['type'];

    //validate form inputs
    $validate = validateForm($file, $type);

    if($validate !== true){
        $errorMsg = $validate;
        redirect('error',$errorMsg);
    }

    // prepare file for upload
    $filename = $_FILES['csfile']['name'];
    $targetFolder = dirname( __FILE__, 2).'/uploadedCSV/';
    $filepath = $targetFolder.basename($filename);


    //uploadFile
    if(!uploadFile($file, $filepath)){
        $errorMsg = "<span style='color: red'>Error occured while uploading file</span>";
        redirect('error',$errorMsg);

    }
}

Data extraction from uploaded CSV file

We have now come to the most interesting part of this article. if you have been following through from the beginning I appreciate your time and effort.

Because we have a data extraction type in our form we are going to create different extraction methods, In the HTML form we have 3 extraction type

  1. Extract header and content

  2. Extract header only

  3. Extract content only

We will create 3 different functions to achieve these extraction types.

function extractCSVHeadersOnly($file){
    $csv = array_map("str_getcsv", file($file,FILE_SKIP_EMPTY_LINES));
    $keys = array_shift($csv);
    return $keys;
}

function extractCSVContentOnly($file){
    $csv = array_map("str_getcsv", file($file,FILE_SKIP_EMPTY_LINES));
    //remove the headers using the array shift method
    array_shift($csv);
    // return the rest of the array 
    return $csv;
}

function extractAllCSVcontent($file)
{
    $csv = array_map("str_getcsv", file($file,FILE_SKIP_EMPTY_LINES));
    $keys = array_shift($csv);
    $csvArrayData = [];
    foreach ($csv as $i=>$row) {
        $csv[$i] = array_combine($keys, $row);
        array_push($csvArrayData, $csv[$i]);
    }
    return $csvArrayData;
}

In the code above, we can extract the content of a CSV file using five built-in functions in PHP, which are

  1. str_getcsv(): parsing CSV strings into arrays

  2. file(): reads the contents of a file and returns its contents as an array,

  3. array_map(): applies specified callback function to each element of an array

  4. array_shift(): used to remove and return the first element from an array

  5. array_combine(): merges two arrays where one array provides keys, and the other provides values to create key-value pairs.

The file() function reads the content of the CSV and parses each line into an array, and then the str_getcsv() function reads these array of lines where each item of the array is a string of coma separated values and converts each of the coma separated value item into an array item.

The array_map method helps to map the stated functionality of the str_getcsv() to every item the array created from the file content. Hence we have a new array that has all the content of the file as a 2-dimensional array that we can manipulate. the code below what the array data will look like.

//so we will have an array that looks like this
[ 
  ['username','email', 'age', 'address'] // first row 
   ['avikkycodes', 'email@example.com', 24, 'my address is remote'] //second row
]

By employing the array_shift() method, we can eliminate the initial index of the array, which also corresponds to the initial line of the file. Subsequently, we store this array in a variable named "keys," which serves as our header data.

All other data within the array constitutes the content that will follow the header.

Once this extraction process is complete, we proceed to transmit our extracted data to the redirect function, enabling us to pass it to the result page where it will be presented to the user.

The final result of our code will resemble the code provided below.

<?php 
session_start();

$errorMsg = null;
$successMsg = null;

if($_SERVER["REQUEST_METHOD"] === "POST"){

   // pass each input into a variable 

    $file = $_FILES['csfile'];

    $type =  $_POST['type'];

    //validate form inputs
    $validate = validateForm($file, $type);

    if($validate !== true){
        $errorMsg = $validate;
        redirect('error',$errorMsg);
    }

    // prepare file for upload
    $filename = $_FILES['csfile']['name'];
    $targetFolder = dirname( __FILE__, 2).'/uploadedCSV/';
    $filepath = $targetFolder.basename($filename);

    //uploadFile

    if(!uploadFile($file, $filepath)){
        $errorMsg = "<span style='color: red'>Error occured while uploading file</span>";
        redirect('error',$errorMsg);

    }

    switch ($type) {
        case 1:
            $headers = getCSVHeaders($filepath);
            $successMsg = ['type' => 1, 'headers' => $headers] ;
            break;
        case 2:
            $content = extractAllCSVcontent($filepath);
            $successMsg = ['type' => 2, 'content' => $content];
            break;
        case 3:
            $headers = getCSVHeaders($filepath);
            $allContents = extractAllCSVcontent($filepath);
            $successMsg = ['type' => 3, 'headers' =>  $headers, 'content' => $allContents];
            break;

        default:
            $headers = getCSVHeaders($filepath);
            $allContents = extractAllCSVcontent($filepath);
            $successMsg = ['type' => 3, 'headers' =>  $headers, 'content' => $allContents];
            break;
    }

    redirect('success',$successMsg);
}

function uploadFile($file, $filepath) : bool {

    if(move_uploaded_file($file["tmp_name"], $filepath)){
        return true;
    }

    return false;
}

// this is a helper function for debuging response 
function dd($data) {
    die('<pre>'. json_encode( json_decode(json_encode($data), true), true). '</pre>');
}

function validateForm($file, $type){
    //check if input submited are empty
    if(!isset($type) && !isset($file)){
        return "<span style='color:red'> all fields are required</span>";
    }
    if($file['size'] > 20024){
        return  "<span style='color:red'> File size must not be greater than 100kb</span>";
    }

    if($file['type'] !== 'text/csv'){
        return  "<span style='color:red'> File must be a .CSV file</span>";
    }

    return true;

}

function getCSVHeaders($file){
    $csv = array_map("str_getcsv", file($file,FILE_SKIP_EMPTY_LINES));


    $keys = array_shift($csv);

    return $keys;

    //  dd($keys);
}

function getCSVContentOnly($file){
    $csv = array_map("str_getcsv", file($file,FILE_SKIP_EMPTY_LINES));

    //remove the headers using the array shift method
    array_shift($csv);

    // return the rest of the array 
    return $csv;
    // dd($csv);
}

function extractAllCSVcontent($file)
{

    $csv = array_map("str_getcsv", file($file,FILE_SKIP_EMPTY_LINES));


    $keys = array_shift($csv);

    $csvArrayData = [];

    foreach ($csv as $i=>$row) {

        $csv[$i] = array_combine($keys, $row);

        array_push($csvArrayData, $csv[$i]);

    }

    return $csvArrayData;

}

function redirect($resType, $resData){
    if(!in_array($resType, ['error', 'success'])){
        die("<p style='color: red; font-weight: bold'>Response type is undefined</p>");
    };


    if(is_array($resData)){
        $data = json_encode($resData);
    }else{
        $data = $resData;
    }

    $url = $_SERVER['HTTP_ORIGIN']. '/projects/csv_extractor/';   

    $urlparam = '?'. $resType.'='.$resType;

    $page = ($resType == 'success')? 'resultPage.php' : 'index.php' ;

    $url.= $page. $urlparam;

    if($resType == 'error'){
        $_SESSION['error'] =  $data;
    }else{
        $_SESSION['success'] =  $data;
    }


    header("Location: $url");
    exit();
}

If you have read to this point, I want to appreciate your effort in taking the time to digest this article, you are truly wonderful.
You can get the code for this project on my GitHub repo link ---> https://github.com/Avikky/php-csv-extractor

Summary

The development of this project was divided into four crucial components:

  1. Form input validation

  2. Redirection

  3. File upload

  4. Data extraction from uploaded CSV file

Throughout this project, we acquired valuable skills. We learned how to validate forms in PHP, effectively manage response data using PHP sessions and redirection, and gain proficiency in basic file uploading in PHP. The final part, which involved data extraction, allowed us to explore various built-in PHP functions. Notably, the str_getcsv() and file() functions played pivotal roles in extracting data from files. Additionally, we conducted fundamental array manipulations using array_map, array_shift, and array_combine to complete the extraction process.

Conclusion

I believe there is room for further improvement in this codebase. From my perspective, it could benefit from a modification to adopt an object-oriented pattern. However, as I mentioned at the outset, it's essential to maintain simplicity to ensure that beginners can fully leverage it.

I genuinely hope that you have gained valuable insights from this article. If you have, please take a moment to share your thoughts in the comments section, like the article, and share it with your PHP colleagues.

Until we meet again, goodbye.