Data analysis and visualization are powerful tools that enable us to make sense of complex datasets and communicate insights effectively. In this immersive exploration of real-world conflict data, we delve deep into the gritty realities and complexities of conflicts. Our focus is on Manipur, a state in northeastern India, that has unfortunately been marred by prolonged violence and unrest. Using the Armed Conflict Location & Event Data Project (ACLED) dataset [1], we embark on an in-depth data analysis journey to uncover the multifaceted nature of the conflicts.
This article was published as a part of the Data Science Blogathon.
No specific organization or entity is responsible for the analysis and interpretation presented in this blog. The aim is purely to showcase the potential of data science in conflict analysis. Furthermore, no personal interests or biases are involved in these findings, ensuring an objective approach to understanding conflict dynamics. Promote the use of data-driven methods as a tool for enhancing insights and informing broader discussions on conflict analysis.
By leveraging the power of data science techniques on ACLED dataset. We can extract insights that not only contribute to understand the situation in Manipur but also shed light on the humanitarian aspects associated with the violence. The ACLED Codebook is a comprehensive reference guide that provides detailed information about the coding scheme and variables used in this dataset [2].
ACLED’s importance lies in its empathetic data analysis, which enhances our understanding of Manipur violence, illuminates humanitarian needs, and contributes to addressing and mitigating violence. It promotes a peaceful and inclusive future for affected communities.
Through this data-driven analysis, we can not only unravel valuable insights but also can highlight the human cost of the Manipur violence. By scrutinizing the ACLED data, I hope we can shed light on the impact on civilian populations, forced displacement, and access to essential services, thereby painting a comprehensive picture of the humanitarian realities faced in the region.
As a first step, we will explore the events of conflict in Manipur using the ACLED dataset. The code snippet given below reads the ACLED dataset for India and filters the data specifically for Manipur, resulting in a filtered dataset with a shape of (number of rows, number of columns). The shape of the filtered data is then printed.
import pandas as pd
#import country specific csv downloaded from acleddata.com
file_path = './acled_India.csv'
all_data = pd.read_csv(file_path)
# Filter the data for Manipur
df_filtered = all_data.loc[all_data['admin1'] == "Manipur"]
shape = df_filtered.shape
print("Filtered Data Shape:", shape)
#Output:
#Filtered Data Shape: (4495, 31)
The number of rows in the ACLED data represents the number of individual events or incidents recorded in the dataset. Each row typically corresponds to a specific event, such as a conflict, protest, or violence occurrence, and contains various attributes or columns that provide information about the event, such as the location, date, actors involved, and other relevant details.
By counting the number of rows in the ACLED dataset, you can determine the total number of recorded events or incidents in the data. By filtering the dataset specifically for Manipur, we obtained a filtered dataset containing information about individual events or incidents recorded from January 2016 to June 9, 2023. The total number of recorded events or incidents in Manipur, which stood at 4495 rows, provided insights into the scope and scale of the conflict or events tracked by ACLED.
As a next step, we calculate the sum of null values along the columns (axis=0) in the df_filtered DataFrame. It provides insights into the count of missing values in each column of the filtered dataset.
df_filtered.isnull().sum(axis = 0)
# Output: count of null values in each column
# event_id_cnty: 0 null values
# event_date: 0 null values
# year: 0 null values
# time_precision: 0 null values
# disorder_type: 0 null values
# event_type: 0 null values
# sub_event_type: 0 null values
# actor1: 0 null values
# assoc_actor_1: 1887 null values
# inter1: 0 null values
# actor2: 3342 null values
# assoc_actor_2: 4140 null values
# inter2: 0 null values
# interaction: 0 null values
# civilian_targeting: 4153 null values
# iso: 0 null values
# region: 0 null values
# country: 0 null values
# admin1: 0 null values
# admin2: 0 null values
# admin3: 0 null values
# location: 0 null values
# latitude: 0 null values
# longitude: 0 null values
# geo_precision: 0 null values
# source: 0 null values
# source_scale: 0 null values
# notes: 0 null values
# fatalities: 0 null values
# tags: 1699 null values
# timestamp: 0 null values
Below code snippet outputs the number of unique values in each column.
n = df_filtered.nunique(axis=0)
print("No.of.unique values in each column:\n", n)
# Output:
# No.of.unique values in each column:
# event_id_cnty: 4495
# event_date: 1695
# year: 8
# time_precision: 3
# disorder_type: 4
# event_type: 6
# sub_event_type: 17
# actor1: 66
# assoc_actor_1: 323
# inter1: 8
# actor2: 61
# assoc_actor_2: 122
# inter2: 9
# interaction: 28
# civilian_targeting: 1
# iso: 1
# region: 1
# country: 1
# admin1: 1
# admin2: 16
# admin3: 37
# location: 495
# latitude: 485
# longitude: 480
# geo_precision: 3
# source: 233
# source_scale: 12
# notes: 4462
# fatalities: 10
# tags: 97
# timestamp: 1070
Manipur is geographically divided into two distinct regions: the valley region and the hilly region. The valley region, located in the central part of Manipur, is relatively flat and surrounded by hills. It is the most densely populated and agriculturally productive area of the state. The hilly region, on the other hand, comprises the surrounding hills and mountains, offering a more rugged and mountainous terrain.
The code given below creates an interactive map using Folium library to visualize the ACLED events that occurred in Manipur during the years 2022 and 2023. It plots the events as circle markers on the map, with each marker’s color representing the corresponding year. It also adds a GeoJSON layer to display Manipur’s boundaries and includes a map title, credits, and a legend indicating the color codes for the years. The final map is displayed with all these elements.
import folium
# Filter the data for the years 2022 and 2023
df_filtered22_23 = df_filtered[(df_filtered['year'] == 2022) | (df_filtered['year'] == 2023)]
# Create a map instance
map = folium.Map(location=[24.8170, 93.9368], zoom_start=8)
# Load Manipur boundaries from GeoJSON file
manipur_geojson = 'Manipur.geojson'
# Create a GeoJSON layer for Manipur boundaries and add it to the map
folium.GeoJson(manipur_geojson,
style_function=lambda feature: {
'fillColor': 'white',
'color': 'black',
'weight': 2,
'fillOpacity': 1
}).add_to(map)
# Define color palette for different years
color_palette = {2022: 'red', 2023: 'blue'}
# Plot the events on the map with different colors based on the year
for index, row in df_filtered22_23.iterrows():
folium.CircleMarker([row['latitude'], row['longitude']],
radius=3,
color=color_palette[row['year']],
fill=True,
fill_color=color_palette[row['year']],
fill_opacity=0.5).add_to(map)
# Add map features
folium.TileLayer('cartodbpositron').add_to(map)
# Set the map's center and zoom level
map.fit_bounds(map.get_bounds())
map.get_root().html.add_child(folium.Element(legend_html))
# Display the map
map
Output:
You can see that, a higher concentration of events is observed in the central valley region. This may be due to various factors such as population density, infrastructure, accessibility, and historical socio-political dynamics. The central valley region, being more densely populated and economically developed, could potentially witness more incidents and events compared to the hilly areas.
ACLED event_type refers to the categorization of different types of events recorded in the ACLED dataset. These event types capture various activities and incidents related to conflicts, violence, protests, and other events of interest. Some event types in the ACLED dataset include violence against civilians, explosions/remote violence, protests, riots, and more. These event types provide insights into the nature and dynamics of conflicts and related incidents recorded in the ACLED database.
Below code generates a bar chart with events are grouped by year visualizing the event types in Manipur, India over the years.
import pandas as pd
import matplotlib.pyplot as plt
df_filteredevent = df_filtered.copy()
df_filteredevent['event_date'] = pd.to_datetime(df_filteredevent['event_date'])
# Group the data by year
df_cross_event = df_filteredevent.groupby(df_filteredevent['event_date'].dt.year)
['event_type'].value_counts().unstack()
# Define the color palette
color_palette = ['#FF5C5C', '#FFC94C', '#FF9633', '#8E8EE1', '#72C472', '#0818A8']
# Plot the bar chart
fig, ax = plt.subplots(figsize=(10, 6))
df_cross_event.plot.bar(ax=ax, color=color_palette)
# Set the x-axis tick labels to display only the year
ax.set_xticklabels(df_cross_event.index, rotation=0)
# Set the legend
ax.legend(title='Event Types', bbox_to_anchor=(1, 1.02), loc='upper left')
# Set the axis labels and title
ax.set_xlabel('Year')
ax.set_ylabel('Event Count')
ax.set_title('Manipur, India: ACLED Event Types by Year')
# Adjust the padding and layout
plt.tight_layout(rect=[0, 0, 0.95, 1])
# Display the plot
plt.show()
Output:
Notably, the visualization of event types in a bar chart highlighted the dominance of the “Protests” category, which could obscure the relative differences and make it challenging to compare other event types accurately. The visualization was adjusted by excluding or separating the “Protests” category, resulting in a clearer comparison of the remaining event types.
Below code snippet filters out the “Protests” event type from the data. It then groups the remaining events by year and visualizes them in a bar chart, excluding the dominant “Protests” category. The resulting visualization provides a clearer view of the event types by year.
import pandas as pd
import matplotlib.pyplot as plt
df_filteredevent = df_filtered.copy()
df_filteredevent['event_date'] = pd.to_datetime(df_filteredevent['event_date'])
# Filter out the "Protests" event type
df_filteredevent = df_filteredevent[df_filteredevent['event_type'] != 'Protests']
# Group the data by year
df_cross_event = df_filteredevent.groupby(df_filteredevent['event_date'].dt.year)
['event_type'].value_counts().unstack()
# Define the color palette
color_palette = ['#FF5C5C', '#FFC94C', '#FF9633', '#8E8EE1', '#72C472', '#0818A8']
# Plot the bar chart
fig, ax = plt.subplots(figsize=(10, 6))
df_cross_event.plot.bar(ax=ax, color=color_palette)
# Set the x-axis tick labels to display only the year
ax.set_xticklabels(df_cross_event.index, rotation=0)
# Set the legend
ax.legend(title='Event Types', bbox_to_anchor=(1, 1.02), loc='upper left')
# Set the axis labels and title
ax.set_xlabel('Year')
ax.set_ylabel('Event Count')
ax.set_title('Manipur, India: ACLED Event Types (excluding Protests) by Year')
# Adjust the padding and layout
plt.tight_layout(rect=[0, 0, 0.95, 1])
# Display the plot
plt.show()
Output:
We use the interative maps for plotting events on a map with varying marker size and color based on event type and frequency. It represents the spatial distribution and intensity of different events, allowing for quick identification of patterns, hotspots, and trends. This approach enhances the geographical dynamics of the events, facilitates data-driven decision-making, and enables effective resource allocation and targeted interventions in response to the identified patterns and frequencies.
The events are plotted as circle markers on the map, with varying color and size based on the event type and frequency respectively.
import folium
import json
# Filter the data for the year 2023
df_filtered23 = df_filtered[df_filtered['year'] == 2023]
# Calculate the event count for each location
event_counts = df_filtered23.groupby(['latitude', 'longitude']).size().
reset_index(name='count')
# Create a map instance
map = folium.Map(location=[24.8170, 93.9368], zoom_start=8)
# Load Manipur boundaries from GeoJSON file
with open('Manipur.geojson') as f:
manipur_geojson = json.load(f)
# Create a GeoJSON layer for Manipur boundaries and add it to the map
folium.GeoJson(manipur_geojson,
style_function=lambda feature: {
'fillColor': 'white',
'color': 'black',
'weight': 2,
'fillOpacity': 1
}).add_to(map)
# Define a custom color palette inspired by ACLED thematic categories
event_type_palette = {
'Violence against civilians': '#FF5C5C', # Dark orange
'Explosions/Remote violence': '#FFC94C', # Bright yellow
'Strategic developments': '#FF9633', # Light orange
'Battles': '#8E8EE1', # Purple
'Protests': '#72C472', # Green
'Riots': '#0818A8' # Zaffre
}
# Plot the events on the map with varying marker size and color based on
# the event type and frequency
for index, row in event_counts.iterrows():
location = (row['latitude'], row['longitude'])
count = row['count']
# Get the event type for the current location
event_type = df_filtered23[(df_filtered23['latitude'] == row['latitude']) &
(df_filtered23['longitude'] == row['longitude'])]
['event_type'].values[0]
folium.CircleMarker(
location=location,
radius=2 + count * 0.1,
color=event_type_palette[event_type],
fill=True,
fill_color=event_type_palette[event_type],
fill_opacity=0.7
).add_to(map)
# Add legends for the year 2023
legend_html = """
<div style="position: fixed; bottom: 50px; right: 50px; z-index: 1000; font-size: 14px;
background-color: rgba(255, 255, 255, 0.8); padding: 10px; border-radius:
5px;">
<p><strong>Legend</strong></p>
<p><span style="color: #FF5C5C;">Violence against civilians</span></p>
<p><span style="color: #FFC94C;">Explosions/Remote violence</span></p>
<p><span style="color: #FF9633;">Strategic developments</span></p>
<p><span style="color: #8E8EE1;">Battles</span></p>
<p><span style="color: #72C472;">Protests</span></p>
<p><span style="color: #0818A8;">Riots</span></p>
</div>
"""
map.get_root().html.add_child(folium.Element(legend_html))
# Display the map
map
Output:
In this step , we gain insight about the different entities or groups involved in the conflict or events in Manipur. In the ACLED dataset, the “actor1” refers to the primary actor involved in a recorded event. It represents the main entity or group that is responsible for initiating or participating in a specific conflict or event. The “actor1” column provides information about the primary actor’s identity, such as a government, rebel group, ethnic militia, or other entities involved in the conflict or event. Each unique value in the “actor1” column represents a distinct actor or group involved in the recorded events.
Then visualized the value counts of ‘actor1’ using the code snippet below:
This code filters a DataFrame based on the value counts of the ‘actor1’ column, selecting only those with counts greater than or equal to 5. It then visualizes the resulting data.
import matplotlib.pyplot as plt
# Filter the DataFrame based on value counts >= 5
filtered_df = df_filtered[(df_filtered['year'] != 2023)]['actor1'].
value_counts().loc[lambda x: x >= 5]
# Create a figure and axes for the horizontal bar chart
fig, ax = plt.subplots(figsize=(8, 6))
# Define the color palette
color_palette = ['#FF5C5C', '#FFC94C', '#FF9633', '#8E8EE1', '#72C472', '#0818A8']
# Plot the horizontal bar chart
filtered_df.plot.barh(ax=ax, color=color_palette)
# Add labels and title
ax.set_xlabel('Count')
ax.set_ylabel('Actor1')
ax.set_title('Value Counts of Actor1 (>= 5) (January 2016 to 9th December 2022)',
pad=55)
# Set the data availability information
data_info = "Accessed on June 17, 2023"
# Add credits and data availability information
plt.text(0.5, 1.1, "Data accessed from:", ha='center', transform=ax.transAxes,
fontsize=10)
plt.text(0.5, 1.05, "Armed Conflict Location & Event Data Project (ACLED);
www.acleddata.com",
ha='center', transform=ax.transAxes, fontsize=10)
plt.text(0.5, 1.0, data_info, ha='center', transform=ax.transAxes, fontsize=10)
# Display the count next to each bar
for i, v in enumerate(filtered_df.values):
ax.text(v + 3, i, str(v), color='black')
# Display the plots
plt.tight_layout()
plt.show()
Output:
The chart represents data from January 2016 to 9th December 2022. Also, the condition “count greater than or equal to 5” means that only the actors with a frequency of occurrence of 5 or more will be included in the analysis and displayed in the chart.
As shown in below code snippet, used following visualizations to compare the category counts between 2022 and 2023.
import matplotlib.pyplot as plt
import numpy as np
# Filter the DataFrame for the year 2022
filtered_df_2022 = df_filtered[df_filtered['year'] == 2022]['actor1'].
value_counts().loc[lambda x: x >= 10]
# Filter the DataFrame for the year 2023
filtered_df_2023 = df_filtered[df_filtered['year'] == 2023]['actor1'].
value_counts().loc[lambda x: x >= 10]
# Get the unique categories that appear more than 10 in either DataFrame
categories = set(filtered_df_2022.index).union(set(filtered_df_2023.index))
# Create a dictionary to store the category counts
category_counts = {'2022': [], '2023': []}
# Iterate over the categories
for category in categories:
# Add the count for 2022 if available, otherwise add 0
category_counts['2022'].append(filtered_df_2022.get(category, 0))
# Add the count for 2023 if available, otherwise add 0
category_counts['2023'].append(filtered_df_2023.get(category, 0))
# Exclude categories with count 0
non_zero_categories = [category for category, count_2022, count_2023 in zip
(categories, category_counts['2022'], category_counts['2023']) if count_2022 >
0 or count_2023 > 0]
# Create a figure and axes for the bar chart
fig, ax = plt.subplots(figsize=(10, 6))
# Set the x-axis positions
x = np.arange(len(non_zero_categories))
# Set the width of the bars
width = 0.35
# Plot the bar chart for 2022
bars_2022 = ax.bar(x - width/2, category_counts['2022'], width, color=color_palette[0],
label='2022')
# Plot the bar chart for 2023
bars_2023 = ax.bar(x + width/2, category_counts['2023'], width, color=color_palette[1],
label='2023')
# Set the x-axis tick labels and rotate them for better visibility
ax.set_xticks(x)
ax.set_xticklabels(non_zero_categories, rotation=90)
# Set the y-axis label
ax.set_ylabel('Count')
# Set the title and legend
ax.set_title('Comparison of Actor1 Categories (>= 10) - 2022 vs 2023')
ax.legend()
# Add count values above each bar
for rect in bars_2022 + bars_2023:
height = rect.get_height()
ax.annotate(f'{height}', xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3), textcoords="offset points", ha='center', va='bottom')
# Adjust the spacing between lines
plt.subplots_adjust(top=0.9)
# Display the plot
plt.show()
Output:
The comparison of ACLED data of Manipur for the year 2022 and data until June 9, 2023 can be obtained by using below code snippet:
import matplotlib.pyplot as plt
import numpy as np
# Filter the DataFrame for the year 2022
filtered_df_2022 = df_filtered[df_filtered['year'] == 2022]['actor1'].
value_counts().loc[lambda x: x >= 10]
# Filter the DataFrame for the year 2023
filtered_df_2023 = df_filtered[df_filtered['year'] == 2023]['actor1'].
value_counts().loc[lambda x: x >= 10]
# Get the unique categories that appear more than 10 in either DataFrame
categories = set(filtered_df_2022.index).union(set(filtered_df_2023.index))
# Create a dictionary to store the category counts
category_counts = {'2022': [], '2023': []}
# Iterate over the categories
for category in categories:
# Add the count for 2022 if available, otherwise add 0
category_counts['2022'].append(filtered_df_2022.get(category, 0))
# Add the count for 2023 if available, otherwise add 0
category_counts['2023'].append(filtered_df_2023.get(category, 0))
# Exclude categories with count 0
non_zero_categories = [category for category, count_2022, count_2023 in
zip(categories, category_counts['2022'], category_counts['2023'])
if count_2022 > 0 or count_2023 > 0]
# Create a figure and axes for the bar chart
fig, ax = plt.subplots(figsize=(10, 6))
# Set the x-axis positions
x = np.arange(len(non_zero_categories))
# Set the width of the bars
width = 0.35
# Plot the bar chart for 2022
bars_2022 = ax.bar(x - width/2, category_counts['2022'], width,
color=color_palette[0], label='2022')
# Plot the bar chart for 2023
bars_2023 = ax.bar(x + width/2, category_counts['2023'], width,
color=color_palette[1], label='2023')
# Set the x-axis tick labels and rotate them for better visibility
ax.set_xticks(x)
ax.set_xticklabels(non_zero_categories, rotation=90)
# Set the y-axis label
ax.set_ylabel('Count')
# Set the title and legend
ax.set_title('Comparison of Actor1 Categories (>= 10) - 2022 vs 2023')
ax.legend()
# Add count values above each bar
for rect in bars_2022 + bars_2023:
height = rect.get_height()
ax.annotate(f'{height}', xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3), textcoords="offset points", ha='center', va='bottom')
# Display the plot
plt.show()
Output:
The next code snippet prepares the data for further analysis or visualization by converting the ‘event_date’ column to datetime. Perform a cross-tabulation, and restructuring the DataFrame to facilitate easier interpretation and usage. It uses the pd.crosstab() function to create a cross-tabulation (frequency table) between the ‘event_date’ (converted to monthly periods using dt.to_period(‘m’)) and the ‘inter1’ column in ‘df_filtered’. Later, groups the filtered DataFrame by ‘event_date’ and calculates the sum of ‘fatalities’ for each date. Calculate and add the sum of fatalities by month to the existing cross-tabulated DataFrame, resulting in ‘df_conflicts’. It includes both the categorized event data and the corresponding fatalities information for further analysis.
import pandas as pd
# Convert 'event_date' column to datetime data type
df_filtered['event_date'] = pd.to_datetime(df_filtered['event_date'])
# Perform the crosstab operation
df_cross = pd.crosstab(df_filtered['event_date'].dt.to_period('m'),
df_filtered['inter1'])
# Rename the columns
df_cross.columns = ['State Forces', 'Rebel Groups', 'Political Militias',
'Identity Militias', 'Rioters', 'Protesters', 'Civilians', 'External/Other Forces']
# Convert the period index to date
df_cross['event_date'] = df_cross.index.to_timestamp()
# Reset the index
df_cross.reset_index(drop=True, inplace=True)
df2 = df_filtered.copy()
df2['event_date'] = pd.to_datetime(df2['event_date'])
fatality_filtered = (df2
.filter(['event_date','fatalities'])
.groupby(['event_date'])
.fatalities
.sum()
)
df_fatality_filtered = fatality_filtered.to_frame().reset_index()
df_fatality_month= df_fatality_filtered.resample('M', on="event_date").sum()
df_fatality_month = df_fatality_month.reset_index()
df_fatalities = df_fatality_month.drop(columns=['event_date'])
df_concat = pd.concat([df_cross, df_fatalities], axis=1)
df_conflicts = df_concat.copy()
Output:
The code visualizes conflict intensity analysis for monthly events in Manipur categorized by actor type, weighted by reported fatalities. The width of the lines is based on the number of fatalities for each actor type. This kind of analysis enables us to identify patterns, and the relative impact of different actor-types involved in the conflicts. Provide the valuable insights for further analysis and decision-making in conflict studies.
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(
name='State Forces',
x=df_conflicts['event_date'].dt.strftime('%Y-%m'),
y=df_conflicts['State Forces'],
mode='markers+lines',
marker=dict(color='darkviolet', size=4),
showlegend=True
))
fig.add_trace(go.Scatter(
name='Fatality weight',
x=df_conflicts['event_date'],
y=df_conflicts['State Forces']+df_conflicts['fatalities']/5,
mode='lines',
marker=dict(color="#444"),
line=dict(width=1),
hoverinfo='skip',
showlegend=False
))
fig.add_trace(go.Scatter(
name='Fatality weight',
x=df_conflicts['event_date'],
y=df_conflicts['State Forces']-df_conflicts['fatalities']/5,
marker=dict(color="#444"),
line=dict(width=1),
mode='lines',
fillcolor='rgba(68, 68, 68, 0.3)',
fill='tonexty',
hoverinfo='text',
hovertemplate='<br>%{x|%b\n%Y}<br><i>Fatalities: %{text}</i>',
text=['{}'.format(i) for i in df_conflicts['fatalities']],
showlegend=False
))
#similiray insert add_trace for other event types too here...
fig.update_xaxes(
dtick="M3", # Set the tick frequency to 3 months (quarterly)
tickformat="%b\n%Y"
)
fig.update_layout(
yaxis_title='No: of Events',
title={
'text': 'Conflict Intensity Analysis: Events in Manipur categorized
by actor type, weighted by reported fatalities',
'y': 0.95,
'x': 0.5,
'xanchor': 'center',
'yanchor': 'top',
'font': {'size': 20}
},
annotations=[
dict(
text="January 2016 to June 9th 2023|Data accessed from:
Armed Conflict Location & Event Data Project (ACLED),www.acleddata.com",
xref="paper",
yref="paper",
x=0.5,
y=1.06,
showarrow=False,
font={'size': 12}
)
],
hovermode="x",
xaxis=dict(
showgrid=False
),
yaxis=dict(
showgrid=False
)
)
fig.data[0].marker.size = 4
fig.data[3].marker.size = 4
fig.data[6].marker.size = 4
fig.data[9].marker.size = 4
fig.data[12].marker.size = 4
fig.data[15].marker.size = 4
fig.data[18].marker.size = 4
fig.data[21].marker.size = 4
fig.show()
Output:
A variable value higher (in this case ‘Protesters’) than others in a multiline graph. It can distort perception, making it difficult to accurately compare and interpret the trends of different variables. The dominance of one variable can reduce, as it becomes challenging to assess the relative changes and relationships between the other variables. The visualization may suffer, with compressed or cluttered visuals, loss of detail in lesser-valued variables, and an unbalanced emphasis that may bias interpretations.
To mitigate these cons and having a clear visualization of the recent conflict intensity, filtered the data for 2023 and 2022 conflict events and below is the output :
Set date as index for conflict trend analysis using the daily data and obtain the below dataframe for further analysis.
In conflict trend analysis, the 30-day and 7-day rolling windows are common. They are used to calculate rolling averages or means of conflict-related data over a specific time period.
The rolling window refers to a fixed-size time interval that moves along the timeline, including a specified number of data points within that interval. For example, in a 30-day rolling window, the interval includes the current day plus the previous 29 days. In a 7-day rolling window, the interval includes the current day plus the previous 6 days, representing a week’s worth of data.
The moving average is calculated by taking the average of the data points within the window. It provides a smoothed representation of the data, reducing short-term fluctuations and highlighting longer-term trends.
By calculating the 30-day and 7-day rolling means in conflict analysis, analysts can gain insights into the overall patterns and trends in conflict events over time. It can identify longer-term trends while also capturing shorter-term fluctuations in the data. These rolling averages can help reveal underlying patterns and provide a clearer picture of the evolution of conflict dynamics.
Below code snippet creates the plots for each conflict scenario.
import matplotlib.pyplot as plt
import pandas as pd
# Variables to calculate rolling means for
variables = ['State Forces', 'Rebel Groups', 'Political Militias',
'Identity Militias', 'Rioters', 'Protesters',
'Civilians', 'External/Other Forces']
# Calculate rolling means for each variable
data_7d_rol = {}
data_30d_rol = {}
for variable in variables:
data_7d_rol[variable] = data_ts[variable].rolling(window=7, min_periods=1).mean()
data_30d_rol[variable] = data_ts[variable].rolling(window=30, min_periods=1).mean()
# Plotting separate graphs for each variable
for variable in variables:
fig, ax = plt.subplots(figsize=(11, 4))
# Plotting 7-day rolling mean
ax.plot(data_ts.index, data_7d_rol[variable], linewidth=2, label='7-d Rolling Mean')
# Plotting 30-day rolling mean
ax.plot(data_ts.index, data_30d_rol[variable], color='0.2', linewidth=3,
label='30-d Rolling Mean')
# Beautification of plot
ax.legend()
ax.set_xlabel('Year')
ax.set_ylabel('Events accounted by ' + variable)
ax.set_title('Trends in ' + variable + ' Conflicts')
# Add main heading and subheading
fig.suptitle(main_title, fontsize=14, fontweight='bold', y=1.05)
#ax.text(0.5, -0.25, sub_title, transform=ax.transAxes, fontsize=10, color='red', ha='center')
ax.text(0.5, 0.95, sub_title, transform=ax.transAxes, fontsize=10, color='red', ha='center')
plt.tight_layout()
plt.show()
Output:
Note: The plots generated and data analysis conducted in this blog are solely for the purpose of demonstrating the application of data science techniques. These analyses do not draw definitive conclusions or interpretations regarding the complex dynamics of conflicts. Approach conflict analysis with caution, recognizing the multifaceted nature of conflicts and the need for comprehensive and context-specific understanding beyond the scope of this analysis.
The blog explores the events and patterns of conflict in Manipur, India using the ACLED data analysis. To visualize the ACLED events in Manipur, use interactive maps and other visualizations. Analyzing the event types in Manipur revealed various activities and incidents related to conflicts, violence, protests, and other events of interest. To understand the trends in conflict events, we calculated the 30-day and 7-day rolling means. These rolling averages provided a smoothed representation of the data, reducing short-term fluctuations and highlighting longer-term trends. Overall, these findings may contribute to a better understanding of the conflict dynamics in the region and can support further research and decision-making processes.
Hope you found this article informative. Feel free to reach out to me on LinkedIn. Let’s connect and work towards leveraging data for positive change.
A. The ACLED (Armed Conflict Location & Event Data Project) dataset is a comprehensive resource that tracks and records detailed information about conflict events worldwide, including political violence, protests, and riots. It contributes to analyzing conflict events by providing researchers and policymakers with valuable insights into the patterns, dynamics, and actors involved, aiding in informed decision-making and conflict-related research.
A. Interactive maps and visualizations allow for the exploration and analysis of spatial and temporal patterns of conflicts by providing a visual representation of data that enables the identification of trends, hotspots, and correlations, enhancing the understanding of conflict dynamics.
A. It is important to carefully visualize and compare event types, especially when one category dominates the dataset, to avoid overshadowing the relative differences and accurately assess the significance and dynamics of other event types.
A. Identifying and analyzing the primary actors involved in conflicts provides insights into the key entities and groups responsible for initiating or participating in the events, helping to understand the dynamics, motivations, and potential interactions between different actors.
A. Rolling mean calculations provide a smoothed representation of conflict incidents by averaging data points over a specific time window, enabling the identification of both short-term fluctuations and long-term trends in the data.
1. Raleigh, Clionadh, Andrew Linke, Håvard Hegre and Joakim Karlsen. (2010). “IntroducingACLED-Armed Conflict Location and Event Data.” Journal of Peace Research 47(5) 651-660.
2. ACLED. (2023). “Armed Conflict Location & Event Data Project (ACLED)Codebook, 2023.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.