Import libraries and set viewing options
import pandas as pd
from dbfread import DBF
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option('display.max_columns', 500)
Import Wellhead data from dbf files and convert to dataframes
dbfwh = DBF('D:/Wyoming/Wyoming080219/080119_Wells/080119WH.dbf')
dbfpa = DBF('D:/Wyoming/Wyoming080219/080119_Wells/080119PA.dbf')
dbfwh_df = pd.DataFrame(iter(dbfwh))
dbfpa_df = pd.DataFrame(iter(dbfpa))
Import production for all of Wyoming from their Excel files
County1_5 = pd.read_excel('D:/Wyoming/Wyoming080219/TTL_Prod062819/Counties01-05_062819.xlsx')
County7_17 = pd.read_excel('D:/Wyoming/Wyoming080219/TTL_Prod062819/Counties07-17_062819.xlsx')
County19_33 = pd.read_excel('D:/Wyoming/Wyoming080219/TTL_Prod062819/Counties19-33_062819.xlsx')
County35_45 = pd.read_excel('D:/Wyoming/Wyoming080219/TTL_Prod062819/Counties35-45_062819.xlsx')
Put all of the production together in one dataframe, reset the index, and rearrange the columns so they are more like other oil and gas databases
FullWyProd81019 = pd.concat([County1_5, County7_17, County19_33, County35_45])
FullWyProd81019 = FullWyProd81019.reset_index(drop = True)
FullWyProd810192 = pd.concat([FullWyProd81019.iloc[:,0:3],FullWyProd81019.iloc[:,52:],FullWyProd81019.iloc[:,3:51]], axis=1)
Reshape the production dataframe so dates and years go down the column and the phases for similar timeframes are together
FullWyProd810192_melt = FullWyProd810192.melt(id_vars =["APINO", "COMPANY","YR", "ResCode", "Reservoir"])
FullWyProd810192_melt[["Month", "Phase"]] = FullWyProd810192_melt["variable"].str.split("_", n = 1, expand = True)
FullWyProd810192_melt = FullWyProd810192_melt.drop(["variable"], axis=1)
months = {'JAN':1, 'FEB':2, 'MAR': 3, 'APR':4, 'MAY':5, 'JUN':6, 'JUL':7, 'AUG': 8, 'SEP': 9,
'OCT': 10, 'NOV': 11, 'DEC': 12}
monthdays = {'JAN':31.0, 'FEB':28.0, 'MAR': 31.0, 'APR':30.0, 'MAY':31.0, 'JUN':30.0, 'JUL':31.0, 'AUG': 31.0, 'SEP': 30.0,
'OCT': 31.0, 'NOV': 30.0, 'DEC': 31.0}
FullWyProd810192_melt["MonthNum"] = FullWyProd810192_melt['Month'].map(months)
FullWyProd810192_melt['MaxDays'] = FullWyProd810192_melt['Month'].map(monthdays)
FullWyProd810192_melt['ActDaysOn'] = 0
Function to help out with converting incorrect month lengths to correct month lengths - Some months are more than 31, so the next two blocks will find and change that.
def dayclean(row):
if row['Days_On'] > row['MaxDays']:
val = row['MaxDays']
else:
val = row['Days_On']
return val
More shaping of the dataframe, correcting month lengths, dropping unnecessary columns, and sorting values
from functools import reduce
FullWyProdOil = pd.DataFrame(FullWyProd810192_melt[FullWyProd810192_melt['Phase'] == "OIL"]).rename(columns={"value": "Oil_bbls", "Phase": "PhaseO"})
FullWyProdGas = pd.DataFrame(FullWyProd810192_melt[FullWyProd810192_melt['Phase'] == "GAS"]).rename(columns={"value": "Gas_Mcf", "Phase": "PhaseG"})
FullWyProdWater = pd.DataFrame(FullWyProd810192_melt[FullWyProd810192_melt['Phase'] == "WATER"]).rename(columns={"value": "Water_bbls", "Phase": "PhaseW"})
FullWyProddays = pd.DataFrame(FullWyProd810192_melt[FullWyProd810192_melt['Phase'] == "DAYS"]).rename(columns={"value": "Days_On", "Phase": "PhaseD"})
dflist = [FullWyProdOil, FullWyProdGas, FullWyProdWater, FullWyProddays]
FullWyProd810192_final = reduce(lambda left,right: pd.merge(left,right,on=["APINO", "COMPANY", "YR",
"ResCode", "Reservoir", "Month", "MonthNum", "MaxDays",
"ActDaysOn"]), dflist)
FullWyProd810192_final['ActDaysOn'] = FullWyProd810192_final.apply(dayclean, axis=1)
FullWyProd810192_final = FullWyProd810192_final.drop(["PhaseO", "PhaseG", "PhaseW", "PhaseD", "MaxDays", "Days_On"],
axis = 1)
FullWyProd810192_final = FullWyProd810192_final.sort_values(by = ["APINO", "YR", "MonthNum"]).reset_index(drop=True)
FullWyProd810192_final = FullWyProd810192_final[["APINO", "COMPANY", "ResCode", "Reservoir", "YR", "Month", "MonthNum", "Oil_bbls", "Gas_Mcf", "Water_bbls", "ActDaysOn"]]
Some reservoir tags are not in the database, so we are retagging them as "NOT_AVAILABLE" - you don't have to, it is something I do so I know I can see it if I am just scrolling through really quick.
FullWyProd810192_final['Reservoir'] = FullWyProd810192_final['Reservoir'].fillna("NOT_AVAILABLE")
Standardizing the production data to be all caps.
FullWyProd810192_final = FullWyProd810192_final.applymap(lambda x:x.upper() if type(x) == str else x)
Join the producing well set and permanently abandonded well set for a complete well header database. Standardize everything to capital letters, and fillna with "NOT_AVAILABLE" on unit code in the case of all the PandA wells.
dbfwh_df = pd.concat([dbfwh_df, dbfpa_df], axis=0, ignore_index=True)
dbfwh_df = dbfwh_df.applymap(lambda x:x.upper() if type(x) == str else x)
dbfwh_df['UNIT_CODE'] = dbfwh_df['UNIT_CODE'].fillna("NOT_AVAILABLE")
A lot of processing to this point, save your progress and come back to it starting here.
Comment out the to_pickle lines when you are reading a dataframe. Comment out the read_pickle lines when you are saving a dataframe
FullWyProd810192_final.to_pickle('D:/Wyoming/Wyoming080219/WYProd81219.pkl')
#FullWyProd810192_final = pd.read_pickle("D:/Wyoming/Wyoming080219/WYProd81219.pkl", compression='infer')
dbfwh_df.to_pickle('D:/Wyoming/Wyoming080219/dbfwh_df.pkl')
#dbfwh_df = pd.read_pickle("D:/Wyoming/Wyoming080219/dbfwh_df.pkl", compression='infer')
These next two blocks use geopy and the lats/longs (suface and bh) to calculate lateral length and add to the number of features available if interested in calculating other values (possibly in conjunction with FracFocus) or for use in machine learning (determining optimal frac designs or better estimation of EURs, and such).
!pip install geopy
from geopy import distance
latadddf = []
for i in range(0, len(dbfwh_df)):
try:
if (((dbfwh_df["LON"].iloc[i] and dbfwh_df["BLON"].iloc[i]) < -1.0) and ((dbfwh_df["LAT"].iloc[i] and dbfwh_df["BLAT"].iloc[i]) > 1.0)):
SurfaceHole = (dbfwh_df["LAT"].iloc[i], dbfwh_df["LON"].iloc[i])
BottomHole = (dbfwh_df["BLAT"].iloc[i], dbfwh_df["BLON"].iloc[i])
Latfootage = distance.distance(SurfaceHole, BottomHole).ft
latadd = [dbfwh_df["APINO"].iloc[i], Latfootage]
latadddf.append(latadd)
else:
latadd = [dbfwh_df["APINO"].iloc[i], np.nan]
latadddf.append(latadd)
except:
latadd = [dbfwh_df["APINO"].iloc[i], np.nan]
latadddf.append(latadd)
latadddf = pd.DataFrame(latadddf, columns = ["APINO", "Lateral_Footage"])
latadddf["Lateral_Footage"] = [(latadddf["Lateral_Footage"].iloc[i] * -1) if (latadddf["Lateral_Footage"].iloc[i] < 0) else (latadddf["Lateral_Footage"].iloc[i]) for i in range(0,len(latadddf["Lateral_Footage"]))]
dbfwh_df = dbfwh_df.merge(latadddf, left_on='APINO', right_on='APINO')
Looking at the statistics below, you may need to clean some of the lateral numbers, but for the most part, you will be able to use a great deal of them in your work.
dbfwh_df['Lateral_Footage'].describe()
LatLengthScatter = plt.scatter(dbfwh_df['Lateral_Footage'],dbfwh_df.index )
LatLengthScatter;
LatLengthHist = plt.hist(dbfwh_df['Lateral_Footage'],bins = 100 )
LatLengthHist;
The lateral length column is on the very right of the following final well header dataframe
dbfwh_df
Final production table, with north of 17 million lines
FullWyProd810192_final