Deepfakes Datasets

12 minute read

Published:

This is the first post in this blog. To start, I will introduce the datasets that I used in my work. These are the datasets that have been used in the literature to train deepfakes models.


VIDTIMIT

Videos have a constant format

  • width 512
  • height 384
  • fps 25
  • frames (54-240)

Identities: 43 with each 10 videos.

Sentences: 10 sentences per identity (one per video) with the following 2 sentences for every person:

  • “She had your dark suit in greasy wash water all year.”
  • “Don’t ask me to carry an oily rag like that.”

The other 8 sentences are randomly selected from a corpus of 238 minus the previous two sentences.

Click me ```json [ "si1010", "si1024", "si1039", "si1084", "si1088", "si1099", "si1105", "si1116", "si1175", "si1178", "si1179", "si1199", "si1218", "si1230", "si1264", "si1265", "si1279", "si1303", "si1360", "si1364", "si1375", "si1386", "si1398", "si1400", "si1425", "si1454", "si1463", "si1469", "si1473", "si1474", "si1490", "si1502", "si1538", "si1539", "si1541", "si1542", "si1543", "si1544", "si1553", "si1555", "si1566", "si1573", "si1587", "si1624", "si1640", "si1653", "si1664", "si1669", "si1714", "si1718", "si1729", "si1735", "si1746", "si1808", "si1825", "si1829", "si1848", "si1894", "si1899", "si1909", "si1933", "si1988", "si2005", "si2016", "si2028", "si2030", "si2084", "si2093", "si2098", "si2099", "si2104", "si2120", "si2128", "si2149", "si2168", "si2169", "si2171", "si2172", "si2173", "si2183", "si2203", "si2222", "si2247", "si2255", "si2270", "si2284", "si2294", "si2299", "si458", "si469", "si475", "si522", "si545", "si548", "si549", "si565", "si569", "si634", "si635", "si639", "si649", "si673", "si728", "si730", "si734", "si745", "si756", "si768", "si770", "si818", "si824", "si833", "si838", "si839", "si844", "si860", "si869", "si908", "si909", "si911", "si912", "si913", "si923", "si943", "si992", "si995", "sx10", "sx100", "sx101", "sx102", "sx103", "sx104", "sx109", "sx11", "sx110", "sx113", "sx115", "sx118", "sx119", "sx1195", "sx12", "sx124", "sx126", "sx13", "sx133", "sx134", "sx138", "sx139", "sx14", "sx140", "sx1625", "sx184", "sx185", "sx188", "sx189", "sx19", "sx190", "sx191", "sx192", "sx193", "sx194", "sx199", "sx20", "sx200", "sx203", "sx205", "sx208", "sx209", "sx214", "sx216", "sx223", "sx224", "sx228", "sx229", "sx23", "sx230", "sx25", "sx274", "sx275", "sx278", "sx279", "sx28", "sx280", "sx281", "sx282", "sx283", "sx284", "sx289", "sx29", "sx290", "sx293", "sx295", "sx298", "sx299", "sx304", "sx306", "sx313", "sx314", "sx318", "sx319", "sx320", "sx34", "sx36", "sx364", "sx365", "sx368", "sx369", "sx370", "sx371", "sx372", "sx373", "sx374", "sx379", "sx380", "sx383", "sx385", "sx388", "sx389", "sx394", "sx396", "sx4", "sx403", "sx404", "sx408", "sx409", "sx410", "sx43", "sx44", "sx48", "sx49", "sx5", "sx50", "sx8", "sx9", "sx94", "sx95", "sx98", "sx99" ] ```

total: 430 videos

label

  • REAL 430 (100.00 %)

fold

  • real 430 (100.00 %)
labelfoldcountpercent
0real430100

DeepfakeTIMIT

total: 1070 videos

label

  • FAKE 640 (59.81 %)
  • REAL 430 (40.19 %)

fold

  • real 430 (40.19 %)
  • fake_HQ 320 (29.91 %)
  • fake_LQ 320 (29.91 %)
labelfoldcountpercent
0real43040.19
1fake_HQ32029.91
1fake_LQ32029.91

UADFV

total: 98 videos

generated with FakeAPP

label

  • REAL 49 (50.00 %)
  • FAKE 49 (50.00 %)

fold

  • youtube 49 (50.00 %)
  • FakeAPP 49 (50.00 %)
labelfoldcountpercent
0youtube4950
1FakeAPP4950

FaceForensics++:

total: 6000 videos

label

  • FAKE 5000 (83.33 %)
  • REAL 1000 (16.67 %)

fold

  • youtube 1000 (16.67 %)
  • Deepfakes 1000 (16.67 %)
  • Face2Face 1000 (16.67 %)
  • FaceShifter 1000 (16.67 %)
  • FaceSwap 1000 (16.67 %)
  • NeuralTextures 1000 (16.67 %)

split

  • train 4320 (72.00 %)
  • test 840 (14.00 %)
  • val 840 (14.00 %)
labelfoldsplitcountpercent
0youtubetest1402.33
0youtubetrain72012
0youtubeval1402.33
1Deepfakestest1402.33
1Deepfakestrain72012
1Deepfakesval1402.33
1Face2Facetest1402.33
1Face2Facetrain72012
1Face2Faceval1402.33
1FaceShiftertest1402.33
1FaceShiftertrain72012
1FaceShifterval1402.33
1FaceSwaptest1402.33
1FaceSwaptrain72012
1FaceSwapval1402.33
1NeuralTexturestest1402.33
1NeuralTexturestrain72012
1NeuralTexturesval1402.33
Identities per split (json) { "train": [1,2,5,6,7,8,9,10,11,13,14,16,17,18,19,20,21,22,25,27,28,30,31,32,33,34,37,38,39,40,41,43,45,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,65,66,67,68,69,70,71,72,74,75,76,77,79,80,81,82,83,85,86,87,88,89,90,92,93,94,95,96,97,98,99,100,101,103,105,107,108,109,110,111,112,113,117,118,119,120,121,122,123,124,125,127,129,130,131,132,133,136,137,139,140,143,144,146,147,148,149,150,151,152,153,155,156,160,162,163,164,165,166,167,168,171,172,173,174,177,178,179,180,181,182,183,184,185,187,188,189,191,193,194,195,196,197,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,215,216,217,218,221,222,224,225,226,228,229,230,231,232,234,235,236,237,238,239,240,241,242,243,244,246,247,248,250,251,252,253,256,258,259,260,262,264,265,266,267,268,269,270,271,272,274,275,276,277,279,281,282,283,285,286,287,289,290,291,292,293,294,295,296,297,298,300,301,302,303,304,305,307,309,310,311,312,313,315,317,318,320,322,323,324,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,343,344,345,346,348,351,353,355,356,357,359,360,361,363,365,366,368,372,373,374,375,377,378,382,383,385,387,390,391,392,393,394,395,396,397,398,400,401,403,405,406,407,408,409,410,411,412,413,414,415,417,424,426,427,428,430,431,432,433,434,435,436,437,438,439,441,442,444,446,447,448,449,450,451,456,457,458,459,460,461,463,464,466,469,473,474,475,476,477,478,481,484,486,487,489,490,491,495,496,497,498,499,500,501,502,503,504,505,506,508,509,510,512,513,515,516,518,519,520,522,523,524,525,526,528,530,531,532,533,534,535,536,537,539,540,541,542,543,544,545,546,547,549,551,553,554,555,556,559,561,562,563,568,569,570,572,573,574,575,576,577,578,580,581,582,587,588,590,591,592,593,594,596,598,600,601,602,603,604,605,609,610,611,613,614,615,616,618,619,620,621,624,626,627,628,629,631,635,636,637,639,641,642,643,644,645,646,648,651,652,653,654,655,657,658,659,662,663,665,667,671,673,674,676,677,678,679,680,681,684,685,686,687,688,689,690,692,693,694,696,697,698,699,700,703,704,708,709,710,711,712,713,715,716,717,718,719,721,722,723,726,728,730,733,734,735,736,737,738,740,743,744,747,748,749,750,751,752,753,754,756,757,758,760,761,763,764,765,766,767,769,770,771,772,773,774,776,778,780,781,782,783,784,785,786,787,788,789,790,791,792,795,796,797,798,799,800,802,803,804,805,806,808,809,810,811,813,814,815,819,822,825,826,828,829,830,831,833,835,836,838,839,840,841,843,844,845,846,848,849,850,853,854,856,858,859,860,861,863,864,866,867,869,870,871,872,873,874,876,877,878,879,881,882,883,884,885,886,887,888,889,891,892,893,894,897,899,901,902,903,905,907,908,910,911,913,914,916,918,920,921,928,929,930,931,934,935,936,937,938,940,941,942,943,944,946,948,950,952,954,957,959,960,961,962,963,964,965,966,967,968,969,972,975,976,977,978,981,983,984,985,986,987,988,989,990,993,994,996,997,998,999], "test": [0,3,12,15,24,26,29,35,36,44,47,48,73,78,102,114,128,135,138,141,142,154,158,161,169,170,176,186,190,214,219,220,227,233,249,255,257,278,280,288,306,308,314,319,321,347,352,358,367,371,376,379,380,381,384,386,388,389,399,404,418,420,421,422,423,425,429,445,452,454,462,467,479,480,485,488,494,507,517,521,527,529,550,552,579,607,608,623,625,630,633,634,650,660,661,669,670,675,682,683,691,695,701,705,706,707,714,724,725,731,732,739,741,755,759,801,812,821,842,847,851,855,862,865,868,880,896,906,917,919,924,932,945,949,953,955,970,973,974,995], "val": [4,23,42,46,64,84,91,104,106,115,116,126,134,145,157,159,175,192,198,223,245,254,261,263,273,284,299,316,325,342,349,350,354,362,364,369,370,402,416,419,440,443,453,455,465,468,470,471,472,482,483,492,493,511,514,538,548,557,558,560,564,565,566,567,571,583,584,585,586,589,595,597,599,606,612,617,622,632,638,640,647,649,656,664,666,668,672,702,720,727,729,742,745,746,762,768,775,777,779,793,794,807,816,817,818,820,823,824,827,832,834,837,852,857,875,890,895,898,900,904,909,912,915,922,923,925,926,927,933,939,947,951,956,958,971,979,980,982,991,992] }

Celeb-DF-v1:

total: 1203 videos

label

  • 1 795 (66.08 %)
  • 0 408 (33.92 %)

fold

  • Celeb-synthesis 795 (66.08 %)
  • YouTube-real 250 (20.78 %)
  • Celeb-real 158 (13.13 %)

split

  • train 1103 (91.69 %)
  • test 100 (8.31 %)
labelfoldsplitcountpercent
0Celeb-realtrain15012.47
0Celeb-realtest80.67
0YouTube-realtrain22018.29
0YouTube-realtest302.49
1Celeb-synthesistrain73360.93
1Celeb-synthesistest625.15

identities:

total = 15

[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17]

missing = 3

[5, 14, 15]

Celeb-DF-v2:

total: 6529 videos

label

  • FAKE 5639 (86.37 %)
  • REAL 890 (13.63 %)

fold

  • Celeb-synthesis 5639 (86.37 %)
  • Celeb-real 590 (9.04 %)
  • YouTube-real 300 (4.59 %)

split

  • train 6011 (92.07 %)
  • test 518 (7.93 %)
labelfoldsplitcountpercent
0Celeb-realtrain4827.38
0Celeb-realtest1081.65
0YouTube-realtrain2303.52
0YouTube-realtest701.07
1Celeb-synthesistrain529981.16
1Celeb-synthesistest3405.21

Identities

total = 15

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61]

missing = 3

[14, 15, 18]

Celebrities

total: 59

Gender: male (56.8%) female (43.2%)

age:

=60 (8.5%); [50, 60] (30.5%); 40s (26.6%) 30s (28.0%) <30 (6.4%)

Ethnic groups: Asians (5.1%) African Americans (6.8%) Caucasians (88.1%)

Deep Fake Detection Dataset

DFD: The Google/Jigsaw DeepFake detection dataset

total: 3431 videos

label

  • 1 3068 (89.42 %)
  • 0 363 (10.58 %)

fold

  • DeepfakeDetection 3068 (89.42 %)
  • actors 363 (10.58 %)

scenes

  • outside_talking_pan_laughing 291 (8.48 %)
  • walk_down_hall_angry 276 (8.04 %)
  • outside_talking_still_laughing 268 (7.81 %)
  • kitchen_pan 263 (7.67 %)
  • talking_against_wall 260 (7.58 %)
  • podium_speech_happy 257 (7.49 %)
  • exit_phone_room 246 (7.17 %)
  • walking_outside_cafe_disgusted 239 (6.97 %)
  • walking_down_street_outside_angry 229 (6.67 %)
  • kitchen_still 225 (6.56 %)
  • talking_angry_couch 198 (5.77 %)
  • secret_conversation 182 (5.30 %)
  • hugging_happy 180 (5.25 %)
  • walking_and_outside_surprised 139 (4.05 %)
  • walking_down_indoor_hall_disgust 121 (3.53 %)
  • meeting_serious 57 (1.66 %)
scenecount realpercent real count fakepercent fake
exit_phone_room280.82 2186.35
hugging_happy160.47 1644.78
kitchen_pan270.79 2366.88
kitchen_still260.76 1995.8
meeting_serious40.12 531.54
outside_talking_pan_laughing280.82 2637.67
outside_talking_still_laughing270.79 2417.02
podium_speech_happy280.82 2296.67
secret_conversation180.52 1644.78
talking_against_wall270.79 2336.79
talking_angry_couch220.64 1765.13
walk_down_hall_angry280.82 2487.23
walking_and_outside_surprised150.44 1243.61
walking_down_indoor_hall_disgust150.44 1063.09
walking_down_street_outside_angry280.82 2015.86
walking_outside_cafe_disgusted260.76 2136.21

DFDC-preview:

total: 5250 videos

label

  • FAKE 4119 (78.46 %)
  • REAL 1131 (21.54 %)

fold

  • method_A 3975 (75.71 %)
  • original_videos 1131 (21.54 %)
  • method_B 144 (2.74 %)

split

  • train 4473 (85.20 %)
  • test 777 (14.80 %)
labelfoldsplitcountpercent
1method_Atrain350166.69
1method_Atest4749.03
1method_Btrain1172.23
1method_Btest270.51
0original_videostrain85516.29
0original_videostest2765.26

DFDC:

Facebook DeepFake Detection Challenge Dataset (DFDC)

total: 128154 videos

label

  • REAL 23654 (81.54 %)
  • FAKE 104500 (18.45 %)

split

  • train 119154 (92.98 %)
  • test 5000 (3.90 %)
  • val 4000 (3.12 %)
labelsplitcountpercent
0test25001.95
0train1915414.95
0val20001.56
1test25001.95
1train10000078.03
1val20001.56

DeeperForensics-1.0

total: 59475 videos

label

  • REAL 48475 (81.50 %)
  • FAKE 11000 (18.50 %)

split

  • train 41452 (69.70 %)
  • test 12368 (20.80 %)
  • val 5655 (9.51 %)
labelsplitcountpercent
0test1015717.08
0train3371956.69
0val45997.73
1test22113.72
1train773313
1val10561.78

Perturbation There are 7 types of distortions with 5 intensity (levels): (transmission errors, compression, etc.)

  • CS: Color Staturation Change
  • BW: Local Block Wise Distortion
  • CC: Color Contrast Change
  • GB: Gaussian blur
  • GNC: White Gaussian Noise in Color Components
  • JPEG: JPEG Compression
  • VC: Video Compression Rate Change

Additionally, 3 distortion are created by mixing respectively 2, 3 and 4 of the above distortions, ie. {CS, BW}, {CS, BW, CC}, {CS, BW, CC, GB}.

Each forgery video is then augmented using:

  • 1 original forgery
  • 1 forgery postprocessed with the original frames by color matching, warping, affine transformation, etc.
  • 1 random distortion at a random level
  • 5 randomly choosen distortion at level-{1, 2, 3, 4, 5}
  • 3 mix-{2, 3, 4} distortions

total: 11 videos per forgery

Identities: All the 11000 FAKE videos are created using source videos from DeeperForensics-1.0 dataset and the target videos are the 1000 pristine videos from FaceForensics++ dataset.

When keeping only the test 140 identities of FaceForensics++ used as source for the DeeperForensics-1.0 dataset, we get the following: label

  • FAKE 1540 = 140 * 11 (100.00 %)

label

  • 1 1540 (100.00 %)

split

  • train 1100 (71.43 %)
  • test 330 (21.43 %)
  • val 110 (7.14 %)

manipulation

  • end to end 140 (9.09 %)
  • reenact postprocess 140 (9.09 %)
  • end to end random level 140 (9.09 %)
  • end to end level 1 140 (9.09 %)
  • end to end level 2 140 (9.09 %)
  • end to end level 3 140 (9.09 %)
  • end to end level 4 140 (9.09 %)
  • end to end level 5 140 (9.09 %)
  • end to end mix 2 distortions 140 (9.09 %)
  • end to end mix 3 distortions 140 (9.09 %)
  • end to end mix 4 distortions 140 (9.09 %)

FFIW10K

total: 19976 videos

label

  • REAL 9988 (50.00 %)
  • FAKE 9988 (50.00 %)

fold

  • web 19976 (100.00 %)

split

  • train 16000 (80.10 %)
  • test 3476 (17.40 %)
  • val 500 (2.50 %)
labelfoldsplitcountpercent
0webtrain800040.05
0webval2501.25
0webtest17388.7
1webtrain800040.05
1webval2501.25
1webtest17388.7

Wilddeepfake

total: 7312 videos

label

  • REAL 3805 (52.04 %)
  • FAKE 3507 (47.96 %)

split

  • train 6506 (88.98 %)
  • test 806 (11.02 %)
labelsplitcountpercent
1test4105.61
1train309742.36
0test3965.42
0train340946.62

KoDF

total: 237824 videos

label

  • 1 175660 (73.86 %)
  • 0 62164 (26.14 %)

fold

  • fo 61867 (26.01 %)
  • crowdsourcing 52964 (22.27 %)
  • dffs 37080 (15.59 %)
  • dfl 34464 (14.49 %)
  • fsgan 23888 (10.04 %)
  • audio-driven 18361 (7.72 %)
  • studio 9200 (3.87 %)

sex

  • F 130502 (54.87 %)
  • M 107322 (45.13 %)
labelfoldsexcountpercent
0crowdsourcingF2702311.36
0crowdsourcingM2594110.91
0studioF46001.93
0studioM46001.93
1foF3261513.71
1foM2925212.3
1audio-drivenF68702.89
1audio-drivenM114914.83
1dflF219109.21
1dflM125545.28
1dffsF225759.49
1dffsM145056.1
1fsganF149096.27
1fsganM89793.78

Acronyms:

  • fo: FOMM
  • dffs: FaceSwap
  • dfl: DeepFaceLab
  • fsgan: FSGAN
  • audio-driven: AudioDriven : {ATFHP, Wav2Lip}
AgeCount%
∼1951.24
20∼2920550.87
30∼3910626.30
40∼496115.14
50∼59194.71
60∼71.74
SexCount%
Female20550.87
Male19849.13
LocationCount%
Crowdsourcing35387.59
Studio5012.41

CDDB

without ProGAN

total: 81847 videos label

  • 1 40925 (50.00 %)
  • 0 40922 (50.00 %)

family

  • non-gan 37098 (45.33 %)
  • gan 34878 (42.61 %)
  • unknown 9871 (12.06 %)

method

  • glow 12000 (14.66 %)
  • stargan_gf 12000 (14.66 %)
  • crn 10210 (12.47 %)
  • imle 10208 (12.47 %)
  • stylegan 9582 (11.71 %)
  • wild 8271 (10.11 %)
  • gaugan 8000 (9.77 %)
  • deepfake 4330 (5.29 %)
  • biggan 3200 (3.91 %)
  • cyclegan 2096 (2.56 %)
  • whichfaceisreal 1600 (1.95 %)
  • san 350 (0.43 %)
labelmethodcountpercent
0biggan16001.95
0crn51056.24
0cyclegan10481.28
0deepfake21652.65
0gaugan40004.89
0glow60007.33
0imle51046.24
0san1750.21
0stargan_gf60007.33
0stylegan47915.85
0whichfaceisreal8000.98
0wild41345.05
1biggan16001.95
1crn51056.24
1cyclegan10481.28
1deepfake21652.65
1gaugan40004.89
1glow60007.33
1imle51046.24
1san1750.21
1stargan_gf60007.33
1stylegan47915.85
1whichfaceisreal8000.98
1wild41375.05