Crime against Women in India
The government data portal of India recently released data on reports of crime against women in different states across India. The data is described at http://data.gov.in/dataset/cases-registered-and-their-disposal-under-crime-against-women-during-2012. The data contains the number of cases reported along with past cases and legal proceedings. The crimes against women are categorized based on the law under which the case is reported. Here is the complete list of crimes.
# download link for the data is given in the above address. crimeW12 <- read.csv("datasets//crime_women_2012.csv", header = TRUE) levels(crimeW12$CRIME.HEAD)
##  "ASSAULT ON WOMEN WITH INTENT TO OUTRAGE HER MODESTY (SECTION 354 IPC)" ##  "CRUELTY BY HUSBAND AND RELATIVES (SECTION 498-A IPC)" ##  "DOWRY DEATH (SECTION 304-B IPC)" ##  "DOWRY PROHIBITION ACT" ##  "IMMORAL TRAFFIC (PREVENTION) ACT" ##  "IMPORTATION OF GIRLS (SECTION 366-B IPC)" ##  "INDECENT REPRESENTATION OF WOMEN (PROHIBITION) ACT" ##  "INSULT TO MODESTY OF WOMEN (SECTION 509 IPC)" ##  "KIDNAPPING AND ABDUCTION (SECTION 363-369, 371-373 IPC)" ##  "RAPE (SECTION 376 IPC)" ##  "SATI PREVENTION ACT" ##  "TOTAL CRIMES "
I am just going to work with the number of cases reported during the year only. Although it might be interesting to explore other variables also. First I needed to do some cleaning:
temp <- crimeW12[, c(1, 4)] crimeRep1 <- data.frame(State = temp[1:38, 1], Rape = temp[1:38, 2], Kidnap = temp[39:76, 2], DowryDeath = temp[77:114, 2], Assault = temp[115:152, 2], Insult = temp[153:190, 2], Husband = temp[191:228, 2], Import = temp[229:266, 2], Sati = temp[267:304, 2], Traffic = temp[305:342, 2], Indecent = temp[343:380, 2], Dowry = temp[381:418, 2], Total = temp[419:456, 2]) crimeRep <- crimeRep1[-c(29, 37, 38), ] temporder <- crimeRep[order(crimeRep$State), ]
I now have a data frame with the names of the crimes and the cases reported during the year sate wise. The states are ordered alphabetically. I can now visualize this on a map. Below is the state wise plot of number of cases reported as rape in 2012.
library(sp) library(RColorBrewer) load(url("http://biogeo.ucdavis.edu/data/gadm2/R/IND_adm1.RData")) gadm$Rape <- temporder$Rape gadm$Total <- temporder$Total col <- brewer.pal(n = 9, name = "OrRd") spplot(gadm, "Rape", col.regions = col, main = "Number of Rapes reported by State", at = c(0, 50, 100, 200, 500, 1000, 1500, 2000, 2500, 4000))
We can look for which of the crimes are closely related together using clustering. Below is the result of clustering of crimes based on complete link.
row.names(crimeRep) <- crimeRep$State crimeRep <- subset(crimeRep, select = -State) dt <- dist(t(crimeRep[, -12])) hcct <- hclust(dt, "complete") plot(hcct, frame.plot = TRUE, main = "Clustering of Crimes against women in India")
In the above picture, the crimes are renamed for accommodating in the picture. This picture suggests that “Kidnapping and abduction” is very closely linked with “Assault on women with intent to outrage her modesty”. The category “Cruelty by husband and relatives” stands out because it has very high occurrences in some states. It is to be noted here that there might be local differences in categorizing crimes into different laws. So, this clustering is dependent on local police procedures of the state.
We can also see which of the states are closely based on the number of crimes. The same method of clustering with complete link is applied here.
d <- dist(crimeRep) hcc <- hclust(d, "complete") plot(hcc, frame.plot = TRUE, main = "Clustering of States based on Crimes against Women") rect.hclust(hcc, k = 5, border = "red")
The state “West Bengal” had highest number of crimes, followed by “Andhra Pradesh”, “Uttar Pradesh” and “Rajashthan”. “Lakshadweep” had the least number of cases (only 2). Off course the population of the state must have been a significant factor here as all the states with large number of crimes are quite dense states. But then again, having a densely populated state does not justify having higher number of crimes. This grouping seems to align with the total number of crimes reported state wise. Below is the plot of the total crimes against women in 2012 by states.
spplot(gadm, "Total", col.regions = col, main = "Total number of Crimes against Women in 2012", at = c(0, 100, 500, 1000, 2500, 5000, 10000, 15000, 25000, 32000))
There might be much more information that can be extracted from this data set. This has been just an elementary analysis. The full code and plots of other types of crimes are available at: https://gist.github.com/abhirupkgp/8475006.